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TO 
PROFESSOR RICHARD SKEMP 


whose theories on the learning of mathematics have been 
a constant source of inspiration 


PREFACE TO THE SECOND EDITION 


he world has moved on since the first edition of this book was written 

on typewriters in 1976. For a start, the default use of male pronouns 

is quite rightly frowned upon. Educationally, research has revealed 
new insights into how individuals learn to think mathematically as they build 
on their previous experience (see [3]).! We have used these insights to add 
comments that encourage the reader to reflect on their own understanding, 
thereby making more sense of the subtleties of the formal definitions. We 
have also added an appendix on self-explanation (written by Lara Alcock, 
Mark Hodds, and Matthew Inglis of the Mathematics Education Centre, 
Loughborough University) which has been demonstrated to improve long- 
term performance in making sense of mathematical proof. We thank the 
authors for their permission to reproduce their advice in this text. 

The second edition has much in common with the first, so that teachers 
familiar with the first edition will find that most of the original content and 
exercises remain. However, we have taken a significant step forward. The 
first edition introduced ideas of set theory, logic, and proof and used them 
to start with three simple axioms for the natural numbers to construct the 
real numbers as a complete ordered field. We generalised counting to con- 
sider infinite sets and introduced infinite cardinal numbers. But we did not 
generalise the ideas of measuring where units could be subdivided to give an 
ordered field. 

In this edition we redress the balance by introducing a new part IV that 
retains the chapter on infinite cardinal numbers while adding a new chapter 
on how the real numbers as a complete ordered field can be extended to a 
larger ordered field. 

This is part of a broader vision of formal mathematics in which certain 
theorems called structure theorems prove that formal structures have natural 
interpretations that may be interpreted using visual imagination and sym- 
bolic manipulation. For instance, we already know that the formal concept of 
a complete ordered field may be represented visually as points on a number 
line or symbolically as infinite decimals to perform calculations. 


1 Numbers in square brackets refer to entries in the References and Further Reading 
sections on page 383. 
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Structure theorems offer a new vision of formal mathematics in which for- 
mal defined concepts may be represented in visual and symbolic ways that 
appeal to our human imagination. This will allow us to picture new ideas 
and operate with them symbolically to imagine new possibilities. We may 
then seek to provide formal proof of these possibilities to extend our theory 
to combine formal, visual, and symbolic modes of operation. 

In Part IV, chapter 12 opens with a survey of the broader vision. Chap- 
ter 13 introduces group theory, where the formal idea of a group—a set with 
an operation that satisfies a particular list of axioms—is developed to prove 
a structure theorem showing that elements of the group operate by permut- 
ing the elements of the underlying set. This structure theorem enables us to 
interpret the formal definition of a group in a natural way using algebraic 
symbolism and geometric visualisation. 

Following chapter 14 on infinite cardinal numbers from the first edition, 
chapter 15 uses the completeness axiom for the real numbers to prove a sim- 
ple structure theorem for any ordered field extension K of the real numbers. 
This shows that K must contain elements k that satisfy k > r for all real 
numbers r, which we may call ‘infinite elements’, and these have inverses 
h = 1/k that satisfy 0 < h < r for all positive real numbers r, which may be 
called ‘infinitesimals’. (There are corresponding notions of negative infinite 
numbers k satisfying k < r for all negative real numbers r.) The structure 
theorem also proves that any finite element k in K (meaning a < k < b for 
real numbers a, b) must be of the form a+h where a is a real number and h is 
zero or an infinitesimal. This allows us to visualise the elements of the larger 
field K as points on a number line. The clue lies in using the magnification 
m : K — K given by m(x) = (x - a)/h which maps a to 0 anda + h to 1, 
scaling up infinitesimal detail around a to be able to see it at a normal scale. 

This possibility often comes as a surprise to mathematicians who have 
worked only within the real numbers where there are no infinitesimals. How- 
ever, in the larger ordered field we can now see infinitesimal quantities in a 
larger ordered field as points on an extended number line by magnifying the 
picture. 

This reveals two entirely different ways of generalising number concepts, 
one generalising counting, the other generalising the full arithmetic of the 
real numbers. It offers a new vision in which axiomatic systems may be de- 
fined to have consistent structures within their own context yet differing 
systems may be extended to give larger systems with different properties. 
Why should we be surprised? The system of whole numbers does not have 
multiplicative inverses, but the field of real numbers does have multiplica- 
tive inverses for all non-zero elements. Each extended system has properties 
that are relevant to its own particular context. This releases us from the 
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limitations of our real-world experience to use our imagination to develop 
powerful new theories. 

The first edition of the book took students from their familiar experience 
in school mathematics to the more precise mathematical thinking in pure 
mathematics at university. This second edition allows a further vision of the 
wider world of mathematical thinking in which formal definitions and proof 
lead to amazing new ways of defining, proving, visualising, and symbolising 
mathematics beyond our previous expectations. 


Ian Stewart and David Tall 
Coventry 2015 
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PREFACE TO THE FIRST EDITION 


his book is intended for readers in transition from school math- 

ematics to the fully-fledged type of thinking used by professional 

mathematicians. It should prove useful to first-year students in uni- 
versities and colleges, and to advanced students in school contemplating 
further study in pure mathematics. It should also be of interest to a wider 
class of reader with a grounding in elementary mathematics seeking an 
insight into the foundational ideas and thought processes of mathematics. 

The word ‘foundations’, as used in this book, has a broader meaning than 
it does in the building trade. Not only do we base our mathematics on these 
foundations: they make themselves felt at all levels, as a kind of cement which 
holds the structure together, and out of which it is fabricated. The founda- 
tions of mathematics, in this sense, are often presented to students as an 
extended exercise in mathematical formalism: formal mathematical logic, 
formal set theory, axiomatic descriptions of number systems, and technical 
constructions of them; all carried out in an exotic and elaborate symbolism. 
Sometimes the ideas are presented ‘informally’ on the grounds that complete 
formalism is too difficult for the delicate flowering student. This is usually 
true, but for an entirely different reason. 

A purely formal approach, even with a smattering of informality, is psy- 
chologically inappropriate for the beginner, because it fails to take account of 
the realities of the learning process. By concentrating on the technicalities, at 
the expense of the manner in which the ideas are conceived, it presents only 
one side of the coin. The practising mathematician does not think purely 
in a dry and stereotyped symbolism: on the contrary, his thoughts tend to 
concentrate on those parts of a problem which his experience tells him are 
the main sources of difficulty. While he is grappling with them, logical rig- 
our takes a secondary place: it is only after a problem has, to all intents and 
purposes, been solved intuitively that the underlying ideas are filled out into 
a formal proof. Naturally there are exceptions to this rule: parts of a prob- 
lem may be fully formalised before others are understood, even intuitively; 
and some mathematicians seem to think symbolically. Nonetheless, the basic 
force of the statement remains valid. 

The aim of this book is to acquaint the student with the way that a practis- 
ing mathematician tackles his subject. This involves including the standard 
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‘foundations’ material; but our aim is to develop the formal approach as a 
natural outgrowth of the underlying pattern of ideas. A sixth-form student 
has a broad grasp of many mathematical principles, and our aim is to make 
use of this, honing his mathematical intuition into a razor-sharp tool which 
will cut to the heart of a problem. Our point of view is diametrically opposed 
to that where (all too often) the student is told ‘Forget all you’ve learned up 
till now, it’s wrong, we'll begin again from scratch, only this time we'll get it 
right’. Not only is such a statement damaging to a student’s confidence: it is 
also untrue. Further, it is grossly misleading: a student who really did forget 
all he had learned so far would find himself in a very sorry position. 

The psychology of the learning process imposes considerable restraints on 
the possible approaches to a mathematical concept. Often it is simply not 
appropriate to start with a precise definition, because the content of the def- 
inition cannot be appreciated without further explanation, and the provision 
of suitable examples. 

The book is divided into four parts to make clear the mental attitude re- 
quired at each stage. Part I is at an informal level, to set the scene. The first 
chapter develops the underlying philosophy of the book by examining the 
learning process itself. It is not a straight, smooth path; it is of necessity a 
rough and stony one, with side-turnings and blind alleys. The student who 
realises this is better prepared to face the difficulties. The second chapter ana- 
lyses the intuitive concept of a real number as a point on the number line, 
linking this to the idea of an infinite decimal, and explaining the importance 
of the completeness property of the real numbers. 

Part II develops enough set theory and logic for the task in hand, looking in 
particular at relations (especially equivalence relations and order relations) 
and functions. After some basic symbolic logic we discuss what ‘proof’ con- 
sists of, giving a formal definition. Following this we analyse an actual proof 
to show how the customary mathematical style relegates routine steps to a 
contextual background—and quite rightly so, inasmuch as the overall flow 
of the proof becomes far clearer. Both the advantages and the dangers of this 
practice are explored. 

Part III is about the formal structure of number systems and related con- 
cepts. We begin by discussing induction proofs, leading to the Peano axioms 
for natural numbers, and show how set-theoretic techniques allow us to con- 
struct from them the integers, rational numbers, and real numbers. In the 
next chapter we show how to reverse this process, by axiomatising the real 
numbers as a complete ordered field. We prove that the structures obtained 
in this way are essentially unique, and link the formal structures to their in- 
tuitive counterparts of part I. Then we go on to consider complex numbers, 
quaternions, and general algebraic and mathematical structures, at which 
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point the whole vista of mathematics lies at our feet. A discussion of infinite 
cardinals, motivated by the idea of counting, leads towards more advanced 
work. It also hints that we have not yet completed the task of formalising our 
ideas. 

Part IV briefly considers this final step: the formalisation of set theory. 
We give one possible set of axioms, and discuss the axiom of choice, the 
continuum hypothesis, and Gédel’s theorems. 

Throughout we are more interested in the ideas behind the formal facade 
than in the internal details of the formal language used. A treatment suitable 
for a professional mathematician is often not suitable for a student. (A series 
of tests carried out by one of us with the aid of first-year undergraduates 
makes this assertion very clear indeed!) So this is not a rigidly logical 
development from the elements of logic and set theory, building up a 
rigorous foundation for mathematics (though by the end the student will 
be in a position to appreciate how this may be achieved). Mathematicians 
do not think in the orthodox way that a formal text seems to imply. The 
mathematical mind is inventive and intricate; it jumps to conclusions: it 
does not always proceed in a sequence of logical steps. Only when everything 
is understood does the pristine logical structure emerge. To show a student 
the finished edifice, without the scaffolding required for its construction, is 
to deprive him of the very facilities which are essential if he is to construct 
mathematical ideas of his own. 


LS. and D.T. 


Warwick 
October 1976 
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PART | 
The Inturtive Background 


The first part of the book reflects on the experiences that the reader will have 
encountered in school mathematics to use it as a basis for a more sophisti- 
cated logical approach that precisely captures the structure of mathematical 
systems. 

Chapter 1 considers the learning process itself to encourage the reader to 
be prepared to think in new ways to make sense of a formal approach. As new 
concepts are encountered, familiar approaches may no longer be sufficient to 
deal with them and the pathway may have side-turnings and blind alleys that 
need to be addressed. It is essential for the reader to reflect on these new 
situations and to prepare a new overall approach. 

Using a ‘building’ metaphor, we are surveying the territory to see how we 
can use our experience to build a firm new structure in mathematics that will 
make it strong enough to support higher levels of development. In a ‘plant’ 
metaphor, we are considering the landscape, the quality of the soil, and the 
climate to consider how we can operate to guarantee that the plants we grow 
have sound roots and predictable growth. 

Chapter 2 focuses on the intuitive visual concept of a real number as a 
point on a number line and the corresponding symbolic representation as an 
infinite decimal, leading to the need to formulate a definition for the com- 
pleteness property of the real numbers. This will lead in the long term to 
surprising new ways of seeing the number line as part of a wider programme 
to study the visual and symbolic representations of formal structures that 
bring together formal, visual, and symbolic mathematics into a coherent 
framework. 


CHAPTER 1 


Mathematical Thinking 


athematics is not an activity performed by a computer in a vac- 

uum. It is a human activity performed in the light of centuries of 

human experience, using the human brain, with all the strengths 
and deficiencies that this implies. You may consider this to be a source of 
inspiration and wonder, or a defect to be corrected as rapidly as possible, as 
you wish; the fact remains that we must come to terms with it. 

It is not that the human mind cannot think logically. It is a question of 
different kinds of understanding. One kind of understanding is the logical, 
step-by-step way of understanding a formal mathematical proof. Each indi- 
vidual step can be checked but this may give no idea how they fit together, of 
the broad sweep of the proof, of the reasons that lead to it being thought of 
in the first place. 

Another kind of understanding arises by developing a global viewpoint, 
from which we can comprehend the entire argument at a glance. This in- 
volves fitting the ideas concerned into the overall pattern of mathematics, 
and linking them to similar ideas from other areas. Such an overall grasp of 
ideas allows the individual to make better sense of mathematics as a whole 
and has a cumulative effect: what is understood well at one stage is more 
likely to form a sound basis for further development. On the other hand, 
simply learning how to ‘do’ mathematics, without having a wider grasp of its 
relationships, can limit the flexible ways in which mathematical knowledge 
can be used. 

The need for overall understanding is not just aesthetic or educational. 
The human mind tends to make errors: errors of fact, errors of judgement, 
errors of interpretation. In the step-by-step method we might not notice 
that one line is not a logical consequence of preceding ones. Within the 
overall framework, however, if an error leads to a conclusion that does not 
fit into the total picture, the conflict will alert us to the possibility of a 
mistake. 


| MATHEMATICAL THINKING | 3 


For instance, given a column of a hundred ten-digit numbers to add up, 
where the correct answer is 137568304452, we might make an arithmetical 
error and get 137568804452 instead. When copying this answer we might 
make a second error and write 1337568804452. Both of these errors could 
escape detection. Spotting the first would almost certainly need a step-by- 
step check of the calculation. The second error, however, is easily detected 
because it does not fit into the overall pattern of arithmetic. A sum of 100 
ten-digit numbers will be at most a twelve-digit number (since 9999999999 x 
100 = 999999999900) and the final proposed answer has thirteen. 

It is a combination of step-by-step and overall understanding that has the 
best chance of detecting mistakes; not just in numerical work, but in all areas 
of human understanding. The student must develop both kinds, in order 
to appreciate the subject fully and be an effective practitioner. Step-by-step 
understanding is fairly easy; just take one thing at a time and do lots of ‘drill’ 
exercises until the idea sinks in. Overall understanding is much harder; it 
involves taking a lot of individual pieces of information and making a coher- 
ent pattern out of them. What is worse is that having developed a particular 
pattern which suits the material at one stage, new information may arise 
which seems to conflict. The new information may be erroneous but it often 
happens that previous experiences that worked in one situation no longer 
operate in a new context. The more radical the new information is, the more 
likely that it does not fit, and that the existing overall viewpoint has to be 
modified. That is what this first chapter is about. 


Concept Formation 


When thinking about any area of mathematics, it helps to understand a little 
about how we learn new ideas. This is especially true of foundational issues, 
which involve revisiting ideas that we already think we know. When we dis- 
cover that we do not—more precisely, that there are basic questions that we 
have not been exposed to—we may feel uncomfortable. If so, it’s good to 
know that we are not alone: it happens to nearly everyone. 

All mathematicians were very young when they were born. This platitude 
has a non-trivial implication: even the most sophisticated mathematician 
must have passed through the complex process of building up mathematical 
concepts. When first faced with a problem or a new concept, the mathem- 
atician turns it over in the mind, digging into personal experiences to see if 
it is like something that has been encountered before. This exploratory, cre- 
ative phase of mathematics is anything but logical. It is only when the pieces 
begin to fit together and the mathematician gets a ‘feel’ for the concept, or 
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the problem, that a semblance of order emerges. Definitions are formulated 
in ways that can be used for deduction, and there is a final polishing phase 
where the essential facts are marshalled into a neat and economical proof. 

As a scientific analogy, consider the concept ‘colour’. A dictionary defin- 
ition of this concept looks something like ‘the sensation produced in the eye 
by rays of decomposed light’. We do not try to teach the concept of colour to 
a child by presenting them with this definition. (Now, Angela, tell me what 
sensation is produced in your eye by the decomposed light radiating from 
this lollipop ...”) First you teach the concept ‘blue’. To do this you show a 
blue ball, a blue door, a blue chair, and so on, accompanying each with the 
word ‘blue’. You repeat this with ‘red’, ‘yellow’, and so on. After a while the 
child begins to get the idea; you point to an object they have not seen be- 
fore and their response is ‘blue’. It is relatively easy to refine this to ‘dark 
blue’, ‘light blue’, and so forth. After repeating this procedure many times, 
to establish the individual colours, you start again. “The colour of that door 
is blue. The colour of this box is red. What colour is that buttercup?’ If the 
response is ‘yellow then the concept ‘colour’ is beginning to develop. 

As a child develops and learns scientific concepts they may eventually be 
shown a spectrum obtained by passing light through a prism. This may lead 
to learning about the wavelength of light, and, as a fully fledged scientist, 
being able to say with precision which wavelength corresponds to light of a 
particular colour. The understanding of the concept ‘colour’ is now highly 
refined, but it does not help the scientist to explain to a child what ‘blue’ is. 
The existence of a precise and unambiguous definition of ‘blue’ in terms of 
wavelength is of no use at the concept-forming stage. 

It is the same with mathematical concepts. The reader already has a large 
number of mathematical concepts established in their mind: how to solve a 
quadratic equation, how to draw a graph, how to sum a geometric progres- 
sion. They have great facility in arithmetical calculations. Our aim is to build 
on this wealth of mathematical understanding and to refine these concepts 
to a more sophisticated level. To do this we use examples, drawn from the 
reader’s experience, to introduce new concepts. Once these concepts are es- 
tablished, they become part of a richer experience upon which we can again 
draw to aim even higher. 

Although it is certainly possible to build up the whole of mathematics 
by axiomatic methods starting from the empty set, using no outside infor- 
mation whatsoever, it is also totally unintelligible to anyone who does not 
already understand the mathematics being built up. An expert can look at 
a logical construction in a book and say ‘I guess that thing there is meant 
to be “zero”, so that thing is “one”, that’s “two”, ... this load of junk must 
be the integers, ... what’s that? Oh, I think I see: it must be “addition”. ..’. 
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The non-expert is faced with an indecipherable mass of symbols. It is never 
sufficient to define a new concept without giving enough examples to ex- 
plain what it looks like and what can be done with it. Of course, an expert 
is often in a position to supply their own examples, and may not need 
much help. 


Schemas 


A mathematical concept, then, is an organised pattern of ideas that are some- 
how interrelated, drawing on the experience of concepts already established. 
Psychologists call such an organised pattern of ideas a ‘schema’. For instance, 
a young child may learn to count (‘one, two, three-four-five, once I caught a 
fish alive’) progressing to ideas like ‘two sweets’, ‘three dogs’, . . . and eventu- 
ally discovers that two sweets, two sheep, two cows have something in com- 
mon, and that something is ‘two’. He or she builds a schema for the concept 
‘two’ and this schema involves the experience that everyone has two hands, 
two feet, last week we saw two sheep in a field, the fish-alive rhyme goes ‘one, 
two, ...’, and so on. It is really quite amazing how much information the 
brain has lumped together to form the concept, or the schema. 

The child progresses to simple arithmetic (“If you have five apples and you 
give two away, how many will you have left?’) and eventually builds up a 
schema to handle the problem “What is five minus two?’ Arithmetic has very 
precise properties. If 3 and 2 make 5, then 5 take away 2 leaves 3. The child 
discovers these properties by trying to make sense of arithmetic. It then be- 
comes possible to use known facts to derive new facts. If the child knows that 
8 plus 2 makes 10, then 8 plus 5 can be thought of as 8 plus 2 plus 3, so the 
sum is 10 plus 3, which is 13. Over time the child can build up a rich schema 
of whole number arithmetic. 

At this point, if you ask “What is five minus six?’ the response is likely 
to be ‘You can’t do it’, or perhaps just an embarrassed giggle that an adult 
should ask such a silly question. This is because the question does not fit 
the child’s schema for subtraction: when thinking about ‘five apples, take 
six away’, this simply cannot be done. At a later stage, experiencing nega- 
tive numbers will give the answer ‘minus one’. What has happened? The 
child’s original schema for ‘subtraction’ has been modified to accommodate 
new ideas—perhaps by thermometer scales, or the arithmetic of banking, or 
whatever—and the understanding of the concept changes. During the pro- 
cess of change, confusing problems will arise (what does minus one apple 
look like?) which may eventually be resolved satisfactorily (apples don’t 
behave like thermometer readings). 
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A large part of the learning process involves making an existing schema 
more sophisticated, so that it can take account of new ideas. This process, 
as we have said, may be accompanied by a state of confusion. If it were 
possible to learn mathematics without becoming confused, life would be 
wonderful. 

Unfortunately, the human mind does not seem to work that way. More 
than 2000 years ago, Euclid supposedly told King Ptolemy I that “There is no 
royal road to geometry’. The next best thing is to recognise not just the con- 
fusion, but also its causes. At various stages in reading this book the reader 
will be confused. Sometimes, no doubt, the cause will be the authors’ slop- 
piness, but often it will be the process of modifying personal knowledge to 
make sense of a more general situation. This type of confusion is creative, 
and it should be welcomed as a sign that progress is being made—unless 
it persists for too long. By the same token, once the confusion is resolved, 
a sudden clarity can appear with a feeling of great pleasure that the pieces 
fit together perfectly like a jigsaw. It is this feeling of perfect harmony that 
makes mathematics not only a challenge, but also an endeavour that leads to 
deep aesthetic satisfaction. 


An Example 


This way to develop new ideas is illustrated by the historical development of 
mathematical concepts—itself a learning process, but involving many minds 
instead of one. When negative numbers were first introduced, they met con- 
siderable opposition: “You can’t have less than nothing’. Yet nowadays, in 
this financial world of debits and credits, negative numbers are a part of 
everyday life. 

The development of complex numbers is another example. Like all math- 
ematicians, Gottfried Leibniz knew that the square of a positive number or 
of a negative number must always be positive. If i is the square root of minus 
one, then i? = ~1, so i cannot be a positive or a negative number. Leibniz be- 
lieved that it should therefore be endowed with great mystical significance: 
a non-zero number neither less than zero nor greater than zero. This led to 
enormous confusion and distrust concerning complex numbers; it persists 
to this day in some quarters. 

Complex numbers do not fit readily into many people’s schema for ‘num- 
ber’, and students often reject the concept when it is first presented. Modern 
mathematicians look at the situation with the aid of an enlarged schema in 
which the facts make sense. 
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Imagine the real numbers marked on a line in the usual way: 


5 -4 3 2 -1 0 
AHI 


Br: 


=F 
Fig. 1.1 The real numbers 


Negative numbers are to the left of zero, positive to the right. Where does i 
go? It can’t go to the left; it can’t go to the right. The people whose schema 
does not allow complex numbers must argue thus: this means that it can’t 
go anywhere. There is no place on the line where we can mark i, so it’s not a 
number. 

However, there’s an alternative. We can visualise complex numbers as the 
points of a plane. (In 1758 Francois Daviet de Foncenex stated that it was 
pointless to think of imaginary numbers as forming a line at right angles to 
the real line. Fortunately others disagreed.) The real numbers lie along the 
‘x-axis’, the number i lies one unit above the origin along the ‘y-axis’, and the 
number x + iy lies x units along the real line and then y units above it (change 
directions for negative x or y). The objection to i (‘it can’t lie anywhere on the 
line’) is countered by the observation that it doesn’t. It lies one unit above the 
line. The enlarged schema can accommodate the disturbing facts without any 
trouble. 


+ 3i 
Ll }--- 2-0-2 nnn enero nn extiy 
+ 2i 
Li 
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H1 
x 
F 
+-2i 
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Fig. 1.2 Putting iin its place 


This happens quite often in mathematics. When a particular situation is 
generalised to a new context, some properties operate in the same way as be- 
fore, such as addition and multiplication both being commutative. But other 
properties (such as the order properties of real numbers) that work well in 
the original schema are no longer relevant in the extended schema (in this 
case the schema of complex numbers). 
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This is a very general phenomenon; it has happened not only to stu- 
dents, but to mathematicians throughout history, up to the present day. If 
you work in an established situation where the ideas have been fully sorted 
out, and the methods used are sufficient to solve all of the usual prob- 
lems, it is not that difficult to teach an apprentice the trade. All you need 
is to grasp the current principles and develop fluency in the methods. But 
when there is a genuine change in the nature of the system, as happened 
when negative numbers were introduced in a world that only used natural 
counting numbers, or when complex numbers were encountered solving 
equations, then there is a genuine period of confusion for everyone. What 
are these newfangled things? They certainly don’t work the way I expected 
them to! 

This can cause deep confusion. Some conquer it by engaging with the ideas 
in a determined and innovative fashion; others suffer a growing feeling of 
anxiety, even revulsion and rejection. 

One such major occasion began in the final years of the nineteenth cen- 
tury and transformed the mathematics of the twentieth and twenty-first 
centuries. 


Natural and Formal Mathematics 


Mathematics began historically with activities such as counting objects 
and measuring quantities, dealing with situations in the natural world. 
The Greeks realised that drawing figures and counting pebbles had 
more profound properties, and they built up the method of Euclidean 
proof in geometry and the theory of prime numbers in arithmetic. Even 
though they developed a Platonic form of mathematics that imagined 
perfect figures and perfect numbers, their ideas were still linked to na- 
ture. This attitude continued for millennia. When Isaac Newton studied 
the force of gravity and the movement of the heavenly bodies, science 
was known as ‘natural philosophy’. He built his ideas about calculus on 
Greek geometry, and on algebra that generalised the natural operations of 
arithmetic. 

The reliance on ‘naturally occurring’ mathematics continued until the late 
nineteenth century, when the focus changed from the properties of objects 
and operations to the development of formal mathematics based on set- 
theoretic definition and logical proof. This historical transition from natural 
to formal mathematics involved a radical change of viewpoint, leading to 
far more powerful insights into mathematical thinking. It plays an essential 
role in the shift from school geometry and algebra to formal mathematics at 
university. 
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Building Formal Ideas on Human Experience 


As mathematics becomes more sophisticated, new concepts often involve 
some ideas that generalise, but others that operate in new ways. As the 
transition is made from school mathematics to formal mathematics, it may 
seem logical to start anew with formal definitions and learn how to make 
formal deductions from first principles. However, experience over the last 
half-century has shown that this is not a sensible idea. In the 1960s, schools 
tried a new approach to mathematics, based on set theory and abstract def- 
initions. This ‘new math’ failed because, although experts might understand 
the abstract subtleties, learners need to build up a coherent schema of know- 
ledge to make sense of the definitions and proofs. We now know more about 
how humans learn to think mathematically. This lets us give examples from 
practical research to show how students have interpreted ideas in ways that 
are subtly different from what is intended in the printed text. We mention 
this to encourage you to think carefully about the precise meanings involved, 
and to develop strong mathematical links between ideas. 

It is helpful to read proofs carefully and to get into the habit of explaining 
to yourself why the definitions are phrased as they are and how each line of a 
proof follows from previous lines. (See the Appendix on Self-Explanation on 
page 377.) Recent research [3] has shown that students who make an effort to 
think through theorems for themselves benefit in the long run. Eye-tracking 
equipment has been used to study how students read pages from the first edi- 
tion of this very book. There is a strong correlation between spending longer 
considering significant steps in a proof and obtaining higher marks on tests 
administered at a later stage. It’s a no-brainer really. A stronger effort at mak- 
ing personal links gives you a more coherent personal schema of knowledge 
that will be of benefit in the long run. 

You need to be sensible about how to proceed. In practice, it is not always 
possible to give a precise, dictionary definition for every concept encoun- 
tered. We may talk about a set being ‘a well-defined collection of objects’, 
but we will be begging the question, since ‘collection’ and ‘set’ mean the same 
thing. 

When studying the foundations of mathematics, we must be prepared to 
become acquainted with new ideas by degrees, rather than by starting from 
a watertight definition that can be assimilated at once. As we continue along 
that path, our understanding of an idea can become more sophisticated. We 
can sometimes reach a stage where the original vague definition can be refor- 
mulated in a rigorous context (‘yellow is the colour of light with a wavelength 
of 5500 A’). The new definition, seemingly so much better than the vague 
ideas that led to its formulation, has a seductive charm. 
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Wouldn't it be so much better to start from this nice, logical definition? 

The short answer is ‘no’. 

In this book, we begin in Part I with ideas that you have met in school. 
We consider the visual number line, and how it is built up by marking 
various number systems, such as the whole numbers, 1, 2, 3, ...; then frac- 
tions between adjacent whole numbers; then signed numbers to the right 
and left of the origin, including signed whole numbers (the integers) and 
signed fractions (the rationals); then expanding to the real numbers includ- 
ing both rational and irrational numbers. In particular, we focus on natural 
ways to perform operations such as addition, multiplication, subtraction, 
and division, using whole numbers, fractions, decimals, and so on, to high- 
light properties that can be used as a basis for formal axioms for the various 
number systems. 

Part II lays the foundations for set theory and logic, appropriate to the 
concept of proof used by mathematicians, with a sensible balance of logical 
precision and mathematical insight. In particular, the reader should note that 
it is essential to focus not only on what the definitions actually say, but also 
to be careful not to assume other properties that may arise not from the def- 
inition but from mental links set up by previous experience. For instance, 
students in school meet functions such as y = x? or f(x) = sin3x, which 
are always given by some kind of formula. However, the general notion of a 
function does not require a formula. All that is needed is that for each value 
of x (in a specified set) there is a single corresponding value of y. This broader 
definition applies to sets in general, not just to numbers. The properties that a 
defined concept must have are deduced from the definition by mathematical 
proof. 

Part III develops the axiomatic structures appropriate for the succes- 
sion of number systems, starting with axioms for natural numbers and 
proof by induction. The story continues by demonstrating how successive 
systems—integers, rationals, and real numbers—can be constructed from 
first principles using set-theoretic techniques. This process culminates in a 
list of axioms that defines the system of real numbers, with two operations 
(addition and multiplication) that satisfy specified properties of arithmetic 
and order, together with a “completeness axiom’ that states that any increas- 
ing sequence bounded above must tend to a limit. These axioms define a 
‘complete ordered field’, and we prove that they specify the real numbers 
uniquely. Real numbers may be pictured as points on a line with the defined 
operations of addition, multiplication, and order, where the line is filled out 
to include irrational numbers such as ,/2 or as infinite decimals that may 
be computed to any required accuracy as a finite decimal. For instance, ./2 
is 1-414 to 3 decimal places, x is approximately equal to the fraction 22/7, 
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or may be calculated to any desired accuracy as a decimal, say 3-14 to two 
decimal places or 3-1415926536 to ten places. 


Formal Systems and Structure Theorems 


This sequence of development, building a formal system from a carefully 
chosen list of axioms, can be generalised to cover a wide range of new situ- 
ations. It has a huge advantage compared to dealing with naturally occurring 
systems that are encountered in everyday life. The theorems that can be de- 
duced from a given list of axioms using formal proof must hold in any system 
that satisfies the axioms—old or new. Formal theorems are future-proofed. 
The theorems apply not only to systems that are already familiar, but also 
to any new system that satisfies the given axioms. This releases us from the 
necessity of re-checking our beliefs in every new system we encounter. This 
is a major step forward in mathematical thinking. 

Another more subtle development is that some theorems deduced within 
a formal system prove that the system has specific properties that allow it to 
be visualised in a certain way, and other properties that allow its operations 
to be carried out using symbolic methods. Such theorems are called structure 
theorems. For example, any complete ordered field has a unique structure 
that may be represented as points on a number line or as decimal expansions. 

This shifts formal proof to a new level of power. Not only do we devote 
lengthy resources to develop a consistent approach to formal proof, ultim- 
ately we can develop new ways of thinking that blend together formal, visual, 
and symbolic ways of operation that combine human ingenuity and formal 
precision. 


Using Formal Mathematics More Flexibly 


In Part IV we show how these more flexible methods can be applied in vari- 
ous contexts, first by applying the ideas to group theory and then to two quite 
different extensions of finite ideas to infinite concepts. One is the extension 
of counting from finite sets to infinite sets, by saying that two sets have the 
same cardinal number if all their elements can be paired so that each elem- 
ent in one set corresponds to precisely one element in the other. Cardinal 
numbers have many properties in common with regular counting numbers, 
but they also have new and unfamiliar properties. For instance, we can take 
away an infinite subset (such as the even numbers) from an infinite set (such 
as the natural numbers) to leave an infinite subset (the odd numbers) with 
the same cardinal number of elements as the original set. By the same token, 
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subtraction cannot be uniquely defined for infinite cardinal numbers, nor 
can division, so the reciprocal of an infinite cardinal number is not defined 
as a cardinal number. 

The second extension places the real numbers, which form a complete or- 
dered field, inside a larger (but not complete) ordered field. Here, an element 
kin the larger field may satisfy the order property ‘k > r for every real num- 
ber 7’. In this sense, k is infinite: in the formally defined order, it is greater 
than all real numbers. Yet this k behaves quite differently from an infinite 
cardinal number, because it has a reciprocal 1/k. Moreover, 1/k is smaller 
than any positive real number. 

Upon reflection, we should not be surprised by these apparently contra- 
dictory possibilities, where an infinite number has a reciprocal in one system 
but not in another. The system of whole numbers that we use for counting 
does not provide reciprocals, but the systems of rational and real numbers 
do. If we select certain properties to generalise different systems, we should 
not be surprised if the generalisations are also different. 

This brings us to an important conclusion. Mathematics is a living subject, 
in which seemingly impossible ideas may become possible in a new formal 
context, determined by stating appropriate axioms. 

Writing over a century ago, when the new formal approach to mathemat- 
ics was becoming widespread, Felix Klein [4] wrote: 


Our standpoint today with regard to the foundations is different from that 
of the investigators of a few decades ago; and what we today would state 
as ultimate principles, will certainly be outstripped after a time. 


On the same page he noted: 


Many have thought that one could, or that one indeed must, teach all 
mathematics deductively throughout, by starting with a definite number 
of axioms and deducing everything from these by means of logic. This 
method, which some seek to maintain on the authority of Euclid, certainly 
does not correspond to the historical development of mathematics. In fact, 
mathematics has grown like a tree, which does not start from its tini- 
est roots and grow merely upward, but rather sends its roots deeper and 
deeper at the same time and rate that its branches and leaves are spread- 
ing upwards. Just so—if we may drop the figure of speech—mathematics 
began its development from a certain standpoint corresponding to normal 
human understanding and has progressed, from that point, according to 
the demands of science itself and of the then prevailing interests, now in 
one direction toward new knowledge, now in the other through the study 
of fundamental principles. 
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We follow this development throughout the book by starting from the ex- 
periences of students in school, digging deeper in Part II to find fundamental 
ideas that we use in Part III to build into formal structures for number 
systems, and expanding the techniques to wider formal structures in Part 
IV. In Part V, we close this introduction to the foundations of mathematics 
by reflecting on the deeper development of fundamental logical principles 
that become necessary to support more powerful mathematical growth in 
the future. 


Exercises 


The following examples are intended to stimulate you into considering your 
own thought processes and your present mathematical viewpoint. Many of 
them do not have a ‘correct’ answer, however it will be most illuminating 
for you to write out solutions and keep them in a safe place to see how your 
opinions may change as you read the text. Later in the book (at the end of 
chapters 6 and 12) you will be invited to reconsider your responses to these 
questions to see how your thinking has changed. Don’t be afraid at this time 
to say that some of the ideas do not make sense to you at the moment. On 
the contrary, it is to your advantage to acknowledge any difficulties you may 
have. The intention of this book is that the ideas will become much clearer 
as you develop in sophistication. 


1. Think how you think about mathematics. If you meet a new problem 
which fits into a pattern that you recognise, your solution may follow 
a time-honoured logical course, but if not, then your initial attack may 
be anything but logical. Try these three problems and do your best to 
keep track of the steps you take as you move towards a solution. 

(a) John’s father is three times as old as John; in ten years he will only 
be twice John’s age. How old is John now? 

(b) A flat disc and a sphere of the same diameter are viewed from the 
same distance, with the plane of the disc at right angles to the line 
of vision. Which looks larger? 

(c) Two hundred soldiers stand in a rectangular array, in ten rows of 
twenty columns. The tallest man in each row is selected and of 
these ten, S is the shortest. Likewise the shortest in each column 
is singled out and T is the tallest of these twenty. Are S and T one 
and the same? If not, what can be deduced about the relative size 
of Sand T? 

Make a note of the way that you attempted these problems, as well as 

your final solution, if you find one. 
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2. Consider the two following problems: 

(a) Nine square metres of cloth are to be divided equally between five 
dressmakers; how much cloth does each one get? 

(b) Nine children are available for adoption and are to be divided 
equally between five couples; how many children are given to each 
couple? 

Both of these problems translate mathematically into: 


‘Find x such that 5x = 9’. 


Do they have the same solution? How can the mathematical formula- 
tion be qualified to distinguish between the two cases? 


3. Suppose that you are trying to explain negative numbers to someone 
who has not met the concept and you are faced with the comment: 


‘Negative numbers can’t exist because you can’t have less than 
nothing.’ 


How would you reply? 


4. What does it mean to say that a decimal expansion ‘recurs’? What 
fraction is represented by the decimal 0-333 . . .? What about 0-999. . .? 


5. Mathematical use of language sometimes differs from colloquial us- 
age. In each of the following statements, record whether you think 
that they are true or false. Keep them for comparison when you read 
chapter 6. 

(a) All of the numbers 2, 5, 17, 53, 97 are prime. 
(b) Each of the numbers 2, 5, 17, 53, 97 is prime. 
(c) Some of the numbers 2, 5, 17, 53, 97 are prime. 
(d) Some of the numbers 2, 5, 17, 53, 97 are even. 
(e) All of the numbers 2, 5, 17, 53, 97 are even. 

(£) Some of the numbers 2, 5, 17, 53, 97 are odd. 

6. ‘If pigs had wings, theyd fly.’ 

Is this a logical deduction? 

7. ‘The set of natural numbers 1, 2, 3, 4, 5, ...is infinite.’ Give an 
explanation of what you think the word ‘infinite’ means in this context. 

8. A formal definition of the number 4 might be given in the following 
terms. 

First note that a set is specified by writing its elements between curly 
brackets { } and that the set with no elements is denoted by Ø. Then we 
define 


4 = {Ø, {8}, (2, {O}}, 1, {0}, 1S, {OH }}. 
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10. 


16 


| 


Can you understand this definition? Do you think that it is suitable for 
a beginner? 


Which, in your opinion, is the most likely explanation for the equality 
(-1) x (-1) = +1? 


(a) A scientific truth discovered by experience. 

(b) A definition formulated by mathematicians as being the only 
sensible way to make arithmetic work. 

(c) A logical deduction from suitable axioms. 

(d) Some other explanation. 

Give reasons for your choice and retain your comments for later 

consideration. 


In multiplying two numbers together, the order does not matter, 
xy = yx. Can you justify this result 

(a) when x, y are both whole numbers? 

(b) when x, y are any real numbers? 

(c) for any numbers whatever? 
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CHAPTER 2 


Number Systems 


he reader will have built up a coherent understanding of the arith- 

metic of the various number systems: counting numbers, negative 

numbers, and so on. But he or she may not have subjected the pro- 
cesses of arithmetic to close logical scrutiny. Later, we place these number 
systems in a precise axiomatic setting. In this chapter we give a brief re- 
view of how the reader may have developed their ideas about these systems. 
Although constant use of the ideas will have smoothed out many of the dif- 
ficulties that were encountered when the concepts were being formed, these 
difficulties tend to reappear in the formal treatment and have to be dealt with 
again. It is therefore worth spending a little time to recall the development, 
before we plunge into the formalities. 

The experienced reader may feel tempted to skip this chapter because of 
the very simple level of the discussion. Please don’t. Every adult’s ideas have 
been built up from simple beginnings as a child. When trying to understand 
the foundations of mathematics, it is important to be aware of the genesis of 
your own mathematical thought processes. 


Natural Numbers 


The natural numbers are the familiar counting numbers 1, 2,3,4,5,.... 
Young children learn the names of these, and the order in which they come, 
by rote. Contact with adults leads the children to an awareness of the mean- 
ing that adults attach to phrases such as ‘two sweets’, ‘four marbles’. Use of 
the word ‘zero’ and the concept ‘no sweets’ is more subtle and follows later. 
To count a collection of objects, we point to them in turn while reciting 
‘one, two, three, ...’ until we have pointed to all of the objects, once each. 
Next we learn the arithmetic of natural numbers, starting with add- 
ition. At this stage the basic ‘laws’ of addition (which we can express 
algebraically as the commutative law a+ b= b +a, and the associative law 
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a+(b+c) = (a+ b)+ c) may or may not be ‘obvious’, depending on the ap- 
proach used. If addition is introduced in terms of combining collections of 
real-world objects and then counting the result, then these two laws depend 
only on the tacit assumption that rearranging the collection does not alter the 
number of things in it. Similarly, one modern approach using coloured rods 
whose lengths represent the numbers (which are added by placing them end 
to end) makes commutativity and associativity so obvious that it is almost 
confusing to have them pointed out. However, if a child is taught addition 
by ‘counting on’, the story is quite different. To calculate 3 + 4, he or she 
starts at 3 and counts on four more places: 4, 5, 6, 7. The calculation 4 + 3 
starts at 4 and counts on three places: 5, 6, 7. That the two processes yield the 
same answer is now much more mysterious. In fact children taught this way 
often have difficulty doing a calculation such as 1 + 17, but find 17 + 1 trivial! 

Next we come to the concept of place-value. The number 33 involves two 
threes, but they don’t mean the same thing. It must be emphasised that this 
is purely a matter of notation, and has nothing to do with the numbers 
themselves. But it is a highly useful and important notation. It can represent 
(in principle) arbitrarily large numbers, and is very well adapted to calcula- 
tion. However, a precise mathematical description of the general processes 
of arithmetic in Hindu-Arabic place notation is quite complicated (which is 
why children take so long to learn them all) and not well adapted to, say, 
a proof of the commutative law. (This can be done, but it’s harder than we 
might expect.) Sometimes a more primitive system has some advantages. For 
instance, the ancient Egyptians used the symbol | to represent 1, a hoop (N 
to represent 10, the end of a scroll © for 100, with other symbols for 1000, 
etc. A number was written by repeating these symbols: thus 247 would have 


been written 
OQONNAN MII 


Adding in Egyptian is easy: all we do is to put the symbols together. Now 
the commutative and associative laws are obvious again. But the notation 
is less suited to computation. To recover place-notation from Egyptian we 


must supply some ‘carrying rules’, such as III | it | = N and insist that we 
never use any particular symbol more than nine times. 

Before proceeding, we introduce a small amount of notation. We write N 
for the set of all natural numbers. The symbol € will mean ‘is an element of’ 
or ‘belongs to’. So the symbols 


2EN 


are read as ‘2 belongs to the set of natural numbers’, or in more usual 
language, ‘2 is a natural number’. 
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Fractions 


Fractions are introduced into arithmetic to make division possible. It is easy 
to divide 12 into 3 parts: 12 = 4+ 4 + 4. It is not possible to divide, say, 11 
into 3 equal parts if we insist that these parts are natural numbers. Hence we 
are led to define fractions as m/n where m, n € N and n # 0. This intro- 
duces a new idea, that different fractions such as 2/4 and 3/6 can involve two 
different processes, where the first divides an object into 4 equal pieces and 
takes 2 of them to get 2 fourths while the second would divide the object into 
6 equal pieces and take 3 to get 3 sixths. The processes are different, but the 
quantity produced is the same (a half). These fractions are said to be equiva- 
lent. Equivalent fractions, when marked on a number line, are marked at the 
same point. 

This observation proves to be seminal throughout this book: equivalent 
concepts at one stage are often reconsidered as single entities later on. In this 
case equivalent fractions are considered as a single rational number. 

Operations of addition and multiplication on the set F of fractions can be 
defined algebraically by the rules 


> 
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It is straightforward (but somewhat tedious) to prove that if the fractions 
are replaced by equivalent fractions, these formulas for the operations yield 
equivalent results. 


Integers 


What fractions do for division, integers do for subtraction. A subtraction 
sum like 2 - 7 =? cannot be answered in N. To do so, we introduce negative 
numbers. Children are often introduced to negative numbers in terms of a 
‘number line’: a straight line with equally spaced points marked on it. One of 
them is called 0; then natural numbers 1, 2,3, . . . are marked successively to 
the right, and negative numbers -1, -2, -3, .. . to the left. 


-3 -2 -l 0 1 2 3 
Fig. 2.1 The integers 
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This gives an extended number system called the ‘integers’. An integer is 
either a natural number n, or a symbol -n where n is a natural number, or 0. 
We use Z to denote the set of integers. (Z is the initial letter of “Zahlen’, the 
German for integers.) 

In your own learning, you met counting numbers N before the integers 
Z were introduced. This step is usually motivated by thinking of a negative 
number as a ‘debt’. Then we can see why we have the rule that ‘minus times 
minus makes plus’, because taking away a debt has the same result as giving 
the corresponding credit. 

Sometimes in school mathematics, a distinction may initially be made be- 
tween counting numbers, 1, 2, 3, ..., and positive integers +1, +2,+3,... with 
their negative counterparts -1, -2, -3,.... There are times when this distinc- 
tion is useful or necessary. Indeed, later we start with counting numbers and 
show how to construct integers formally. In this process there is a difference 
between the two. However, if we carry on maintaining such distinctions, 
we will only be making unnecessary work for ourselves. For example, the 
symbolic statement 4 - (+2) (taking away +2 from 4) involves a different op- 
eration from 4 + (-2) (adding -2 to 4). However, it is clearly sensible to say 
that both equal 4 - 2. 

In the same way, later we start with counting numbers and use set the- 
ory to construct integers. This process leads to a different symbolism for 
counting numbers and positive integers; however, they clearly have the same 
properties, so it is sensible to think of them as being the same. 

In set-theoretic notation, the symbol C means ‘is a subset of’. We then 
have 


N CZ, 
where every natural number is also a (positive) integer. Similarly 


N CF. 


Rational Numbers 


The system Z is designed to allow subtraction in all cases; the system F allows 
division (except by zero). However, in neither system are both operations 
always possible. To get both working at once we move into the system of 
rational numbers Q (for “quotients’). This is obtained from F by introducing 
‘negative fractions’ in much the same way that we obtained Z from N. 

We can still represent Q by points on a number line, by marking fractions 
at suitably spaced intervals between the integers, with negative ones to the 
left of 0 and positive ones to the right. For example, 4/3 is marked one third 
of the way between 1 and 2, like this: 
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-3 -2 -l 0 1 2 3 
1— 
Fig. 2.2 Marking a rational number 


The rules for adding and multiplying rational numbers are the same as for 
fractions, but now m, n, p, q are allowed to be integers rather than natural 
numbers. 

Both Z and F are subsets of Q. We can summarise the relations between 
the four number systems so far encountered by the diagram: 


Fig. 2.3 Four number systems 


Real Numbers 


Numbers can be used to measure lengths or other physical quantities. 
However, the Greeks discovered that there exist lines whose lengths, in 
theory, cannot be measured exactly by a rational number. They were 
magnificent geometers, and one of their simple but profound results was 
Pythagoras’ theorem. Applied to a right-angled triangle whose two shorter 
sides have lengths 1, this implies that the hypotenuse has length x, where 
¥Ż =+ =2, 


1 
Fig. 2.4 Pythagoras and ,/2 


However, x cannot be rational, because there is no rational number m/n such 
that (m/n)? = 2. To see why, we use the result that any natural number can 
be factorised uniquely into primes. For instance, we can write 


360=2x2x2x3x3x5 
or 


360=5x2x3x2x3x2, 
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but however we write the factors we will always have one 5, two 3s, and three 
2s. Using index notation we write 


360 = 2°? x 3? x 5. 


We shall prove this unique factorisation theorem formally in chapter 8 but 
for the moment we assume it without further proof. 

If we factorise any natural number into primes and then square, each 
prime will occur an even number of times. For instance, 


3607 = (2? x 3? x 5)? = 2° x 34 x 5°, 


and the indices 6, 4, 2 are all even. A general proof is not hard to find. 

Now take any rational number m/n and square it. (Since m/n has the same 
square as -m/n, we may assume m and n positive.) Factorise m? and n? and 
cancel factors top and bottom if possible. Whenever a prime p cancels, then 
since all primes occur to even powers it follows that p? cancels. Hence, after 
cancellation, all primes still occur to even powers. But (m/n) is supposed to 
equal 2, which has one prime (namely 2) which only occurs once (which is 
an odd power). 

It follows that no rational number can have square 2, so the hypotenuse of 
the given triangle does not have rational length. 

With a little more algebraic symbolism we can tidy up this proof and 
present it as a formal argument, but the above is all that we really need. The 
same argument shows that numbers like 3, 3/4, or 5/7 do not have rational 
square roots. 

The implication is clear. If we want to talk of lengths like ./2, we must 
enlarge our number system further. Not only do we need rational numbers, 
we need ‘irrational’ ones as well. 

Using Hindu-Arabic notation this can be done by introducing decimal 
expansions. We construct a right-angled triangle with sides of unit length, 
and using drawing instruments transfer the length of its hypotenuse to the 
number line. We then obtain a specific point on the number line that we call 
,/2. It lies between 1 and 2 and, on subdividing the unit length from 1 to 2 
into ten equal parts, we find that ./2 lies between 1-4 and 1-5. 


-2 -1 0 1 2 3 


aoo 
; 


v2 
Fig. 2.5 Marking ,/2 


By further subdividing the distance between 1-4 and 1-5 into ten equal 
parts we might hope to obtain a better approximation to „/2. 
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Fig. 2.6 Marking more accurately 


Already in a practical situation we are reaching the limit of accuracy in 
drawing. We might imagine that in an accurate diagram we can look suf- 
ficiently close, or magnify the picture, to give the next decimal place. If we 
were to look at an actual picture under a magnifying glass, not only would the 
lengths be magnified, but so would the thickness of the lines in the drawing. 
This would not be a very satisfactory way to obtain a better estimate for 4/2. 


Fig. 2.7 Using a magnifying glass 


Practical drawing is in fact extremely limited in accuracy. A fine drawing 
pen marks a line 0-1 millimetres thick. Even if we use a line 1 metre long as a 
unit length, since 0-1 mm = 0-0001 metres, we could not hope to be accurate 
to more than four decimal places. Using much larger paper and more refined 
instruments gives surprisingly little increase in accuracy in terms of the num- 
ber of decimal places we can find. A light year is approximately 9-5 x 101° 
metres. As an extreme case, suppose we consider a unit length 10!8 metres 
long. If a light ray started out at one end at the same time that a baby was 
born at the other, the baby would have to live to be over 100 years old before 
seeing the light ray. At the lower extreme of vision, the wavelength of red 
light is approximately 7 x 107 metres, so a length of 107 metres is smaller 
than the wavelength of visible light. Hence an ordinary optical microscope 
cannot distinguish points which are 10” metres apart. On a line where the 
unit length is 1018 metres we cannot distinguish numbers which are less than 
10-7/10'8 = 107” apart. This means that we cannot achieve an accuracy of 
25 decimal places by a drawing. Even this is a gross exaggeration in practice, 
where three or four decimal places is often the best we can really hope for. 


Inaccurate Arithmetic in Practical Drawing 


The inherent inaccuracy in practice leads to problems in arithmetic. If we 
add two inaccurate numbers, the errors also add. If we cannot distinguish 
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errors less than some amount e, then we cannot tell the difference, in prac- 
tice, between a and a + ze and between b and b + $e. But adding, we can 
distinguish between a + b and a + b + $e. When we come to multiplication, 
errors can increase even more dramatically. We cannot hope to get answers 
to the same degree of accuracy as the numbers used in the calculation. 

If we use arithmetic to calculate all answers correct to a certain number 
of decimal places, the errors involved lead to some disturbing results. Sup- 
pose, for example, that we work to two decimal places (‘rounding up’ if the 
third place is 5 or more and down if it is less). Given two real numbers a 
and b, we denote their product correct to two decimal places by a & b. For 
example, 3-05 @ 4-26 = 12-99 because 3-05 x 4-26 = 12-993. Using this law 
of multiplication we find that 


(1-01 @ 0-5) @ 10 4 1-01 ® (0-5 ® 10). 


The left-hand side reduces to 0-51 & 10 = 5-1, whilst the right-hand side 
becomes 1-01 @ 5 = 5-05. This is by no means an isolated example, and it 
shows that the associative law does not hold for ®. 

If we further define a @ b to be the sum correct to two decimal places, we 
will find other laws that do not hold, including the distributive law 


a@(b@c) =(a@b)@(a@o. 


A Theoretical Model of the Real Line 


We have just seen that if our measurement of numbers is not precise, then 
some of the laws of arithmetic break down. To avoid this we must make our 
notion of real number exact. 

Suppose we are given a real number x on a theoretical real line, and we try 
to express it as a decimal expansion. As a starting point, we see that x lies 
between two integers. 


-2 -1 0 1 2 3 
+++ + ++ 
i; 
xX 


Fig. 2.8 Marking a real number 


In the above example x is between 2 and 3, so x is ‘two point something. 
Next we divide the interval between 2 and 3 into ten equal parts. 

Again, x lies in some sub-interval. In the picture, x lies between 2-4 and 
2-5, so x is ‘2-4 something’. To obtain a still better idea, we divide the interval 
between 2-4 and 2-5 into ten equal parts and repeat the process to find the 
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next figure in the decimal expansion. Already, in a practical situation, we are 
reaching the limit of accuracy in drawing. 


-2 -1 0 1 2 3 
$+} 4. ae 
i 
x 


Fig. 2.9 Marking more accurately 


For our theoretical picture we must imagine that we can look sufficiently 
closely, or magnify the picture, to read off the next decimal place. If we 
looked at an actual picture under a magnifying glass, not only would the 
lengths be magnified, but so would the thickness of the lines. 


Fig. 2.10 Magnifying 


This is not very satisfactory for getting a better estimate. We must, in the 
theoretical case, assume that the lines have no thickness, so that they are 
not made wider when the picture is magnified. We can represent this as a 
practical picture by drawing the magnified lines with the same drawing im- 
plements as before, and making them as fine as possible. In this case x lies 
between 2-43 and 2-44, so x is ‘2-44 something’. 


\ 
UG 


Fig. 2.11 Magnifying more accurately 


Using this method we can, in theory, represent any real number as a deci- 
mal expansion to as many figures as we require. If we are careful to define this 
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expansion to avoid ambiguity, two numbers will be different if, by calculating 
sufficiently many terms, we eventually obtain different answers for some 
decimal place. 

We can express this theoretical method as follows in more mathematical 
terms. 


(i) Given a real number x, find an integer ag such that 
a9 <x <atl. 


(ii) Find a whole number a, between 0 and 9 inclusive such that 


ay a,t+1 
ag + — <x < Aot 
10 
(iii) After finding ao, 41, . . . , An-1, Where a1, . . . , dy; are integers between 
0 and 9 inclusive, find the integer a, between 0 and 9 inclusive for 


which 


ay an ay an+1 
ag + — +--+ — <x < apt — ++ 
10 10” 10 10” 


This gives an inductive process which at the nth stage determines x to n 
decimal places: 


Ag + A12... An L X < Ao A103... An + 1/10”. 


The theoretically exact representation of the number x requires a decimal 
expansion 


Ag - 414243444546 ... 


that goes on forever. (Of course, if all a, from some point on are zero, we 
omit them in normal notation; instead of 1066-31700000000... we write 
1066-317.) An infinite decimal is called a real number. The set of all real 
numbers is denoted by R. 

In most practical situations we will need only a few decimal places. Earlier 
we saw that 25 decimal places are sufficient for all ratios of lengths within 
human visual capacity, and that two or three places are usually sufficient for 
many practical purposes. 


Different Decimal Expansions for Different 
Numbers 


If we expand a number x as above in an endless decimal, we say that 
Ay * a10.. . An is the expansion of x to the first n decimal places (without 
‘rounding up’). If two real numbers x and y have the same decimal expansion 
to n places then 
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ao -a...an < X< Ao- a...an + 1/10", 


ao:a... An SY < AoA... An + 1/10". 
The second line of inequalities can be rewritten as 
-40 - A; . . . An — 1/10" < -y < -ay- a)... An. 
Adding this to the first line we obtain 
-1/10" < x- y < 1/10”. 


In other words, if two real numbers have the same decimal expansion to n 
places, then they differ by at most 1/10”. 

If x and y are different numbers on the line and we wish to distinguish 
between them, all we need do is find n such that 1/10” is less than their dif- 
ference: then their expansion to n places will differ. This again exposes the 
deficiencies of practical drawing, where x and y might be too close to dis- 
tinguish. In our theoretical concept of the real line, this distinction must 
always be possible. It is so important that it is worth giving it a name. The 
great Greek mathematician Archimedes stated a property that is equivalent 
to what we want, so we shall name our condition after him: 


Archimedes’ Condition: Given a positive real number £, there exists a 
positive integer n such that 1/10” < e. 


Rationals and Irrationals 


As we have seen, the real number ./2 is irrational: so are many others. 
It is not always easy to prove a given number irrational. (It’s moderately 
easy for e, less so for x, and there are many interesting numbers which 
mathematicians have been convinced for centuries are irrational, but have 
never proved them to be.) But just the fact that ,/2 is irrational implies 
that between any two rational numbers there exist irrational numbers. First 
we need: 


Lemma 2.1: If m/n and r/s are rational, with r/s 4 0, then m/n + (r/s),/2 
is irrational. 


Proof: Suppose that m/n + (r/s),/2 is rational, equal to p/q where p, q are 
integers. Solve for ,/2 to obtain 


/2 = (pn - mq)s/qnr 


which is rational, contrary to the irrationality of /2. 
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Proposition 2.2: Between any two distinct rational numbers there exists 
an irrational number. 


Proof: Let the rational numbers be m/n and r/s, where m/n < r/s. Then 


2 
min < min + ees min) < ris 


(because ,/2/2 < 1), and the number in the middle is irrational by the 
lemma. 


There is a corresponding result with ‘rational’ and ‘irrational’ inter- 
changed: 


Proposition 2.3: Between any two distinct irrational numbers there exists 
a rational number. 


Proof: Let the irrational numbers be a, b with a < b. Consider their decimal 
expansions, and let the nth decimal place be the first in which they differ. 
Then 


á = fg- Al ssy- 


b = ao < a) .. .An-1bn.. -> 


where a, 7 by. Let x = ao + a... an-1bn. Then x is rational anda < x < b. 
But since b is irrational, x # b, so we must havea < x < b. 


In fact, the exercises at the end of this chapter show that the rational and 
irrational numbers are mixed up in a very complicated way. One should not 
make the mistake of thinking that they ‘alternate’ along the real line. 

The rational numbers may be characterised as those whose decimal ex- 
pansions repeat at regular intervals (though we shall omit the proof). To be 
precise, say that a decimal is repeating if, from some point on, a fixed se- 
quence of digits repeats indefinitely. For example, 1-5432174174174174... 
is a repeating decimal. We shall write it as 15432174, with dots over the end 
digits of the block that repeats. 


The Need for Real Numbers 


The Greeks’ belief that all numbers are rational (enshrined in the mystic phil- 
osophy of the cult of Pythagoreans) led them to a logical impasse. Viewing 
the real numbers as infinite decimals helps to overcome this mental block, 
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because it makes it clear that rational numbers, whose expansions repeat, do 
not exhaust the possibilities. 

However, we have also seen that for practical purposes we do not need in- 
finite decimals, nor even very long finite ones. Why go to all the trouble? One 
reason we have already noted: the arithmetic of decimals of limited length 
fails to obey the familiar laws which integers and rational numbers obey. A 
perhaps more serious reason arises in analysis. 

Consider the function f given by 


f(x) =x -2 (xeR). 


This is negative at x = 1, positive at x = 2. In between, it is zero at x = „/2. 
However, if we restrict x to take only rational values, the function 


fix) =x -2 (x € Q) 


is also negative at x = 1, positive at x = 2, but is not zero at any rational x in 
between, because x” = 2 has no rational solution. 


Fig. 2.12 No rational solution 


This is a nuisance. A fundamental theorem in analysis asserts that if a 
continuous function is negative at one point and positive at another, then 
it must be zero in between. This is true for functions over the real numbers, 
but not for functions over the rationals. A civilisation such as that of the an- 
cient Greeks, with no satisfactory method for handling irrational numbers, 
cannot build a theory of limits, or invent calculus. 
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Arithmetic of Decimals 


The idea of infinite decimals representing real numbers is a useful one, 
but it is not well suited to numerical manipulations, nor to theoretical 
investigations beyond an elementary level. We add two finite decimals by 
starting at the right-hand end, but infinite decimals do not have right-hand 
ends, so there is nowhere to start. 

We can instead start at the left-hand end, adding the first decimal places, 
then the first two, then the first three, and so on. To see what happens, try 
adding 2/3 = 0-6 and 2/7 = 0-285714 in this way. 


6 +-2 = 8 

-66 + -28 = -94 

-666 + -285 = -951 
-6666 + -2857 = -9523 
-66666 +-28571 = -95237 
-666666 + -285714 = -952380. 


The actual answer is 2/3 + 2/7 = 20/21 = -952380. Notice that adding the 
first decimal places does not give the answer to one decimal place, nor does 
adding the first two places give the first two places of the answer. This is pre- 
cisely because of the possibility of ‘carried’ digits from later places affecting 
earlier ones. 

However, in this example, successive terms increase and get closer 
and closer to the actual answer. The sequence of numbers -8, -94, -951, 
-9523, ...is an increasing sequence of real numbers, and it ‘tends to’ 20/21 
in the sense that the error can be made as small as we please by calculating 
enough decimal places. 

In the next few sections we shall examine in detail the ideas required to 
make this concept precise. For theoretical purposes it is often easier to use in- 
creasing sequences (of approximations to a real number) rather than decimal 
expansions. 


Sequences 


A sequence of real numbers can be thought of as an endless list 
QA, A2, A3, Ady ees apys 


where each term a, is a real number. (Using set theory we shall give a more 
formal definition in chapter 5.) 
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Examples 2.4: 


(1) The sequence of squares: 1, 4, 9, 16,... where ay, =n’. 

(2) The sequence of decimal approximations to ./2 is 1-4, 1-41, 
1-414, ... where a y to n places. 

(3) The sequedtall 13 ME 2, ... wherean=1+5 +i i+. + 

(4) The sequence 3, 1, %4, N 5, 9,. .. where a, = the nih digit i in the decimal 
expansion of 7. 


1 


We often use the shorthand notation 
(an) 


for the sequence a1, a2, . . .„ where the nth term is placed in round brackets. 
Thus example (1) could be written (n’). 

Notice how general the concept of a sequence is. We can consider any 
endless list of numbers. It is not necessary that the nth term be defined by a 
‘nice formula’, as long as we know what each a, is supposed to be. 

Sequences can be added, subtracted, or multiplied. It is necessary to de- 
fine what we mean by this: the simplest way is to perform the operations on 
each pair of terms in corresponding positions. In other words, to add the 
sequences 


1,42, ... 
and 
bi, bo, ... 
means to form the sequence 
a, + bi, a, + bo, .... 


For example, ifa, = n? and b, = 1+3+---+4+, then the nth term of (a,)+(b,) 
is 
m+ledge ti. 
n 


Since the nth term of the sequence (a,,) + (b,) is a, + b,, we can express the 
rule for addition as 


(an) F (bn) = (an gi bn). 
Similarly the rules for subtraction and multiplication are 


(an) = (bn) = (an E bn), 
(an)(bn) = (anbn). 
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In the case of division we put 
(@n)/(bn) = (An/ bn); 


noting that this division can be carried out only when all terms b, are 
non-zero. 


Example 2.5: Ifa, = ./2 to n decimal places, and b, = the nth decimal 
place in x, then the first few terms of (a,,)(b,) are 


14x 3 = 4-2 
1-41 x 1 = 1-41 
1-414 x 4 = 5-656 
1-4142 x 1 = 1-4142. 


If you were given the sequence 4-2, 1-41, 5-656, 1-4142, could you have 
guessed the rule for the nth term? This drives home the point that in order 
to specify a sequence we must know in principle how to calculate all of its 
terms. In general, it is not enough to write down the first few terms and a 
few dots. The sequence 3, 1, 4, 1, 5,9, . . . certainly looks as if it consists of the 
digits of x. However, it might just as well be the sequence of digits of the 
number 355/113, which starts off the same way. This is why, in example (4), 
we specify the general rule for finding the nth term. 

Nevertheless, you will often find mathematicians writing things like 
2, 4, 8, 16, 32,... and expecting you to infer that the nth term is 2”. One as- 
pect of learning mathematics is to understand how mathematicians actually 
work, and what their idiosyncrasies are: you should be prepared to accept 
slight differences in notation provided that the idea is clear from the context. 


Order Properties and the Modulus 


We digress to introduce an important concept. If x is a real number we define 
the modulus or absolute value of x to be 


x ifx>0, 
Isl = -x ifx <0. 


The graph of |x| against x looks like: 
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Fig. 2.13 The modulus function 


The value of |x| tells us how large or small x is, ignoring whether it is 
positive or negative. Perhaps the most useful fact about the modulus is the 
triangle inequality, so called because its generalisation to complex numbers 
expresses the fact that each side of a triangle is shorter than the other two put 
together. It is: 


Proposition 2.6 (Triangle Inequality): Ifx andy are real numbers, then 
lx + y| < lal + yl. 


Proof: The visual idea is that |x + y| says how far from the origin x + y is, and 
this is at most the sum of the distances |x| and |y| of x and y from the origin, 
being less if x and y have opposite sign. (Draw some pictures to check this.) 
The easiest way to prove it logically is to divide into cases, according to the 
signs and relative sizes of x and y. 


(i) x >0,y > 0. Then x + y > 0, so 
|x + yl = x+y = |x| + |y]. 
(ii) x > 0,y < 0.Ifx +y > 0 then 
Jx+yl=xt+y <x-y= |x| + |yl. 
On the other hand, if x + y < 0 then 
|x + y| = -x +y) = -x-y < |x| + |y- 


(iii) x < 0, y > 0 follows as in case (ii) with x and y interchanged. 
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(iv) x <0,y < 0. Then x + y < 0, so 


jx + y| = -x-y = |x| + lyl. 


Be on the lookout for variations on this theme, such as 


|x-yl +ly-z] 2 |x-z 


> 


which follows since x - z = (x - y) + (y - z), so that |x - y| + |y - z| > |x- z|. 
The modulus is most useful for expressing certain inequalities succinctly. 
For example, 


a-&E<x<ate 
can be written 
-€<Xx-a<é, 
which translates into 


|x-al < e. 


Convergence 


Now we are ready to consider the general notion of representing a real num- 
ber as a ‘limit’ of a sequence, rather than just being a particular decimal 
expansion. As an exercise, the reader should mark, to as large a scale and 
as accurately as possible, the numbers 1-4, 1-41, 1-414, 1-4142, ./2, on the 
interval between 1 and 2. 

The numbers 1-4, 1-41, 1-414, 1-4142, get closer and closer together, un- 
til they become indistinguishable from each other and from ,/2, up to the 
accuracy of the drawing. By drawing a more accurate picture we must go 
further along the sequence of decimal approximations to ,/2 before this hap- 
pens. If we work to an accuracy of 1078, then from the eighth term onwards 
all points of the sequence are indistinguishable from ./2. 

This observation motivates the theoretical concept of convergence. Let € 
be any positive real number (e is the Greek letter epsilon, for ‘e, and may 
be thought of as the initial letter of ‘error’). For practical convergence of a 
sequence (a,) to a limit J, if we are working to an accuracy £, we require 
there to be some natural number N such that the difference between a, and | 
has size less than £ when n > N. In other words, |a, -I| < £. In the following 
diagram we cannot distinguish points less than ¢ apart; in this case N = 7 
and a, is indistinguishable from / when n > 7. 
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a, da, 4, 


Fig. 2.14 Practical convergence 


For theoretical convergence we ask that a similar phenomenon should oc- 
cur for all positive £. This is on the explicit understanding that smaller values 
of e may require larger values of N. In this sense, N is allowed to depend on e. 
Thus we reach: 


Definition 2.7: A sequence (an) of real numbers tends to a limit 1 if, given 
any € > 0, there is a natural number N such that 


la, - I| < eforalln > N. 


Mathematicians use various pieces of shorthand notation to express this 
concept. To say ‘the sequence (a,,) tends to the limit P we write 


lim a, = 1, 
noo 


or 
An > las n > oo. 


The symbol ‘n —> oo’ is read as ‘n tends to infinity’ and is meant to remind 
us that we are interested in the behaviour of a, as n becomes large (namely 
n > N for an appropriately large number N). 

The symbol oo has historical connotations that can have a variety of differ- 
ent meanings. We will return to these in chapters 14 and 15 to see that ideas 
that occurred in history and in the minds of growing students can be inter- 
preted formally in very interesting ways. Until then, we will usually refrain 
from using the symbol and just write 


lima, = l. 


Example 2.8: The sequence 1-1, 1-01, 1-001, 1-0001, ..., for which a, = 
1 + 107”, tends to the limit 1. For, given € > 0, we have to make 


|1 +10”-1| <eforn>N 
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by finding a suitable N. But this follows from Archimedes’ condition: if we 
find N to make 10™ < e, then for all n > N we have 10" < 10° < e. (If 
the theory of logarithms is available, we take N > log,)(//e).) 


Definition 2.9: A sequence (a;,) which tends toa limit l is called convergent. 
If no limit exists, it is said to be divergent. 


A convergent sequence can tend to only one limit. For suppose a, —> I 
and a, —> m, where l 4 m. Take € = zll - m|. For large enough n, 


lan -l| < £, |an- m| < e. 


From the triangle inequality, 


1- m| < 2e = |l-m 

In other words, if all the terms a,, must eventually be very close to l, they 
cannot also be very close to m, because this requires them to be in two 
different places at the same time. 


Completeness 


Definition 2.10: A sequence (a,) is increasing if each ay < ay41, so that 


a, <a. <a3<.... 


Suppose that (a,) is an increasing sequence. Either the terms a, increase 
without limit, eventually becoming as large as we please, or else there must 
be some real number k such that a, < k for all n. An example of a sequence 
of the first type is 1, 4, 9, 16, 25, ...; one of the latter type is the sequence 
of decimal approximations to e: 2-7, 2-71, 2-718, 2-7182, ..., every term of 
which is less than 3. 


Definition 2.11: If there exists a real number k such that a, < k for all n 
we say that (a,,) is bounded. 


If we draw the points of a bounded increasing sequence on a part of the 
real line we need only draw the interval between a, and k, since all the other 


points lie inside this. So a typical picture is: 


PANT I 


Fig. 2.15 A bounded increasing sequence 
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It seems visually evident that the terms become increasingly squashed to- 
gether, and tend to some limit / < k. This intuition is correct if we consider 
sequences of real numbers and real limits, but it is wrong for sequences of 
rational numbers and rational limits. In fact the sequence of decimal ap- 
proximations to ,/2 is an increasing sequence of rational numbers with no 
rational number as limit. 

The fact that every bounded sequence of real numbers tends to a real num- 
ber as limit as known as the completeness property of the real numbers. The 
origin of the name is that the rational numbers are ‘incomplete’ because 
numbers like ,/2 are ‘missing’. As we consider a formal approach to the real 
numbers, we will see this idea in a new light. 

We can make the completeness property of the reals very plausible in 
terms of our ideas about decimals. Let (a,) be an increasing sequence of real 
numbers, with a,, < k for all k. 

The set of integers between a; - 1 and k is finite, so there is an integer bo 
that is the largest integer for which some term a,, of the sequence is > bo. 
Now all terms a, are less than bo + 1. 


a, a, a, a, a, l; Aag- 
9s 40 4 
b, bl 
Fig. 2.16 Later terms between successive integers 
We subdivide the interval from bo to bo + 1 into ten parts, and find b; so 


that some term a, > bo+b;/10, but no term a, > bo+(b +1)/10. Continuing 
in this way we get a sequence of decimals 


bo, bo - bi, bo - by bp; .. 


such that for n>n, the term a, lies between by-b,b2...b, and 
bo - bj ba... b, + 1/10". Then the real number 


l = bo - bib... 


has the property that |an -1| < 1/10" for all n > n,. Hence an > las n —> oo. 
It is easy to check that this / is less than or equal to k. 


Decreasing Sequences 
There is no need to be obsessed with increasing sequences. 
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Definition 2.12: A sequence (an) is decreasing if an > an+ı for all n. If 
it satisfies a, > k for all n then k is a lower bound and the sequence is 
bounded below. (To avoid ambiguity with increasing sequences we can now 
say ‘bounded above’ instead of “‘bounded’.) There is a similar theorem con- 
cerning decreasing sequences, but instead of copying out the proof again and 
changing the inequalities we use a trick. If (an) is decreasing, then (-a,) is in- 
creasing. Ifa, > k for all n then -a, < -k for all n, so (-a,) is bounded above, 


hence tends to a limit 1. It follows easily that a, —> -l. Hence any decreasing 
sequence of real numbers bounded below by k tends to a limit -1 > k. 


Different Decimal Expansions for the Same Real 
Number 


Previously we expanded a real number x as an infinite decimal, 
x = dpa) ..., by using the inequalities 
ay An ai an+ 1 
aot — +-+ Ix ata HeH ; 
10 10” 10 10” 
where do is an integer and a, is an integer from 0 to 9 for n > 1. This 
condition can be written 


Ay + Ajaz ...An L X < Ag+ a\a_...a, + 1/10". (2.1) 


This, used successively for n = 1,2,3,..., gives a unique decimal expan- 
sion for any real number, and different real numbers have different decimal 
expansions. However, this is not quite the whole story since certain deci- 
mal expansions do not occur when we use condition (2.1). For example the 
expansion 0-999999..., where ay) =0 and a, =9 for all n > 1, does not 
occur. 

Why does this happen? Suppose there were a real number x with decimal 
expansion (according to (2.1)) 0-999999 .... Then 


0-999...9<x <0-999...9+1/10", 
where there are n 9s each time. Therefore 
1-(1/10") < x <1, 
or 
0<1-x < 1/10" 


forall n € N. But this is impossible by Archimedes’ condition: since 1-x > 0 
there must exist n with 1/10” < 1 - x. 
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The reason why this sequence of 9s cannot occur is our choice of inequal- 
ities in (2.1). If instead we use 


Ao * Adz ...An < X < Ao : A102 . . . An + 1/10" (2.2) 


then we get an equally useful definition of the decimal expansion, and it is 
easy to see that the expansion of the number x = 1 now takes the form 
0-999999.... 

However, the second rule (2.2) will now never give us the expansion 
1-000000.... 

These are the only possibilities. For example, if a number x has two 
different decimal expansions, then, without loss in generality, we can take 


X = A9: A) . . .An-1ån . - . = Ao ` A,... Ay by... where ay, < by. 
Multiply through by 10” to get 
Ap, . . . An-1ân * Any... = A041 . . - An-1bn ` Any, --. Where ay < bn. 
Subtracting the whole number apa; . . . dy_1 a, gives 
0+ Any... = k- bay... where k = by 41 - an+1 > 0 is a positive integer. 


But the first decimal is 0 - ayy)... < 0-999... < 1 and the second exceeds 
the positive integer k. So they can be equal only if k = 1 and both decimals 
represent the same limiting value 1. In this case, dni) = An2 = +--+ = 9, 
bast = byig = +++ =Oandb, =a, +1. 

For example, 3-14999 . . . equals 3-15000.... 

This proves that an infinite decimal expansion is unique, except when 
one representation is finite, given by (2.1), and the other ends in an infinite 
number of 9s, given by (2.2). 

It is important not to think that 0-99...9... is a number ‘infinitely 
smaller’ than 1. They are just two different ways of writing the same real 
number. 

It is convenient to allow both notations because under certain circum- 
stances a calculation may give rise to the infinite sequence of 9s. This will 
happen using the method given earlier to find the decimal expansion of the 
limit of a bounded increasing sequence. 


Example 2.13: Suppose a; = 1 and in general a,,) = an+(5)", then trivi- 
P $ 5 A iyl 

ally (a,,) is increasing and a calculation gives a, = 2 - (5) , So the sequence 

is bounded above by 2. Using the same method to calculate the decimal ex- 

pansion using definition (2.2) instead of (2.1), the limit of the sequence (a,) 


is then found to be 
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bo - biba... by. =1-99...9.... 
To cover all cases, we introduce the following: 


Definition 2.14: The value of an infinite decimal dp - aidz ...aņp... is 
the limit / of the sequence (d,) of decimals to n decimal places, where 
dy = ag: 4103 . . . Ay. 


Using this definition, 0-333 . . . is 1/3, and 0-999... is 1. 

CoMMENT. Research has shown that most people initially believe that 
0.999... is ‘just less than 1’. The psychological reason seems to be that we 
think of a sequence (a,,) not as a list of numbers but as a ‘variable quantity’ 
that varies as n varies. For example, if a, = 1/n, then we tend to think of 
the nth term as varying with n and becoming dynamically smaller and smal- 
ler. The variable term in this case gets closer and closer to zero, but never 
equals zero. This dynamic intuition makes us believe that 0-999... is ‘just 
less than one’ rather than equal to one. It can lead to resistance to accepting 
the definition of an infinite decimal being defined as the limiting value. 

One of us taught an introductory course [5] on convergence using com- 
puters for students to investigate the numerical convergence of sequences to 
get the sense that if a sequence converged, then, to a given number of places, 
the sequence stabilised onto a fixed value. They were introduced to the idea 
that the limit was the precise value that the sequence stabilised on, leading to 
the formal definition of the limit / of a sequence (a,), including the specific 
example that if a, = 1 - 1/10” then the limit / equals 1. Before the course, as 
expected, 21 out of 23 stated that 0-9 was just less than one and only two said 
that it was equal to 1. After the course, the students remained of the same 
opinion. In a class discussion, the general opinion of the students was that 
they knew that the repeating decimal never reached 1, so trying to define it 
equal to one was not possible. 

In order to make sense of formal mathematics, it is essential to get to know 
the definitions and to be aware of precisely what they say. Only then will it 
become possible to build up a coherent formal theory. In this case, the limit 
of a sequence (a,,) is defined to be the fixed number | that it approaches, as 
formulated in the definition. 


Bounded Sets 


By drawing the picture of a bounded increasing sequence we can actually 
see the limit process in action, as later terms in the sequence pack together 
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inside a rapidly decreasing space. We now consider not just a sequence, but 
an arbitrary subset S C R which is bounded above by some k. This means 
that s < k for all s € S. Is there some concept analogous to the limit? 


Fig. 2.17 A set S bounded above 


The naive thing to expect is that S has a greatest element, a number sp € S 
such that so > s for every s € S. Unfortunately, this is not quite right. For 
example, if S is the set of all elements of R that are strictly less than 1, then S 
is bounded above—for example by k = 1. However, there is no element in S 
that is greater than all the others. For suppose that y were a greatest element. 
Then y € Ssoy < 1, and then 


y<5iy+l) <1. 


So $(y + 1) € S, but is greater than the supposedly greatest element y. 
However, all is not lost: we just have to be more subtle. 
We need some terminology: 


Definition 2.15: A non-empty subset S C R is bounded above by k € Rif 
s < k for all s € S. The number k is called an upper bound for S. 


In the previous example the set Shas many upper bounds: in fact any k > 1 
is an upper bound for S. Now the set of all upper bounds does have a least 
element. In fact in this example it is 1. In other words, not only is 1 an upper 
bound, but every other upper bound is bigger. 


Definition 2.16: A subset S C R has a least upper bound à € Rif: 


(i) à is an upper bound for S, 
(ii) ifk € Ris any other upper bound for S, then A < k. 


Although upper bounds are ten a penny, a least upper bound must be 
unique. For if à and u are least upper bounds for S, then (ii), applied to 
each of them, tells us that A < u and u < A, so that A = u. 
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Examples 2.17: 


(1) If S is the set of all integers, then S has no upper bounds, so certainly 
no least upper bound. 

(2) If Sis the set of all real numbers less than or equal to 49, then 49 is the 
least upper bound of S. 

(3) If Sis the set of all decimal approximations 1-4, 1-41, 1-414,... to ./2, 
then the least upper bound of S is 4/2. 

(4) IfS is the set of all rational numbers r such that r? < 2, then ,/2 is the 
least upper bound of S. 


In example (2) the least upper bound is an element of S, but in examples 
(3) and (4) it is not. So even when least upper bounds exist, they may not be 
members of the original set. 

There is once more a parallel set of concepts. 


Definition 2.18: A subset S is bounded below if there exists k € R with 
k < s forall s € S, and k is then called a lower bound. The number p € R is 
a greatest lower bound for S if: 


(i) u is alower bound for S, 
(ii) if k is another lower bound for S, then u > k. 


A similar trick to that used on decreasing sequences allows us to refer all 

problems on greatest lower bounds back to least upper bounds. In fact all the 
basic properties of upper bounds hold for lower bounds, provided that we 
interchange > and <. 
CoMMENT. One student, who continued to think that ‘zero point nine re- 
peating is just less than one’ despite all efforts to convince him otherwise, 
also believed that the least upper bound of a set was always a member of the 
set. He was invited to consider the set S of real numbers less than one. He 
declared that this proved his point because the least upper bound of S was, 
in his view, equal to zero point nine repeating, which is just less than one [1]. 
This belief may prove to be very difficult to overcome. As another student 
commented at the end of his first course: ‘I understand it should be 1... and 
that the limit of the sequence is actually 1. It’s down to notation. It’s just a bit 
hard to let go of 0-9999 recurring. . .’ [6]. 

Strong beliefs based on human intuition can impede the appreciation 
of a more formal approach using definitions. However, you will not make 
progress in formal mathematics unless you build carefully on the defin- 
itions as given. The limit is the fixed value to which the terms of the 
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sequence approximate. For example, the notation 1-414... in the context of 
the sequence of decimal expansions of ‘,/2 to n decimal places’ denotes the 
limit of the sequence, which is the fixed real number „/2. 

To make progress in building mathematics from definition and proof, it is 
important to know the definition and how to make deductions from it. Only 
then will formal mathematics build into a coherent structure. For example, 
using formal definition and deduction, the completeness property of the real 
numbers can be used as a foundation to prove more general properties of the 
real numbers, such as: 


Proposition 2.19: Every non-empty subset of R that is bounded above has 
a least upper bound. 


Note the careful use of the adjective ‘non-empty’. This is necessary because 
any number is an upper bound for a set with no elements. The proof of the 
above proposition can be made plausible by using decimal expansions in the 
same sort of way that we dealt with increasing sequences. It is more straight- 
forward to deal with lower bounds and then use the trick to convert to upper 
bounds. This means we look at: 


Proposition 2.20: Every non-empty subset of R which is bounded below 
has a greatest lower bound. 


Proof: Let S C Rand let ao be the largest integer that is a lower bound for S. 
Let a, be the largest integer between 0 and 9 for which ap - a is a lower bound 
for S. Then, generally, let a„ be the greatest integer between 0 and 9 for which 
dy : 4\ Az... An isa lower bound for S. We claim that 


À = a ; aa... 


is the greatest lower bound. The proof is mainly a matter of unravelling 
decimal notation, and is complicated by the occurrence of ‘carry digits’ in 
arithmetic. 

First, we show that À is a lower bound. If not, there exists s € S such that 
s < à. By Archimedes’ condition there exists n € N such that 10” < À - s. 
Therefore a, can be reduced by 1 in the definition of A; or, if a, = 0, some 
earlier a,, > 0 can be reduced by 1. But this contradicts the definition of i. 

Then we show that every lower bound wp is less than or equal to i. If 
not, 4 > A, so by Archimedes’ condition there exists n € N such that 
10” < u - à. Therefore a, can be increased by 1 in the definition of A; or, 
ifa, = 9, some earlier am < 9 can be increased by 1. But this contradicts the 
definition of À. 
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We have remarked that it is not possible to make a drawing sufficiently 
accurate to distinguish rational numbers from irrational ones. But questions 
of upper and lower bounds expose a vital theoretical difference between real 
and rational numbers. Examples (3) and (4) above are bounded sets of ra- 
tionals with no rational least upper bound. In this sense, R is complete but Q 
is not. It is this property that will play a vital role when we come to a formal 
definition of the real numbers later in this book. 


Exercises 


44 


. For real numbers a, r and natural number n, let sn = at+ar +---+ar". 


. Assuming any results you need about prime factorisation of natural 


numbers, show that every positive rational number can be written in 
exactly one way as a product 


r= ppt. p 
where pı = 2, po = 3, . . . are the primes in increasing order and each œg 
is an integer (positive, negative, or zero). 
Write the following rationals in this manner: 14/45, 3/8, 2, 20/45. 
Show that ,/r is rational precisely when all of the powers a, @2,... 
are even. Deduce that for a positive integer n, „y/n is irrational if and 
only if n is not the square of an integer. 


. Extend the result of exercise 1 to find those rational numbers r such 


that ./r (cube root of r) is irrational. Show that ,7 3 is irrational for all 
natural numbers n > 2. 


. Which of the following statements are true? 


(a) If x is rational and y is irrational, then x + y is irrational. 

(b) If x is rational and y is rational, then x + y is rational. 

(c) If x is irrational and y is rational, then x + y is rational. 

(d) If x is irrational and y is irrational, then x + y is irrational. 
Prove the true ones and give examples to disprove the false ones. 


. Prove that between any two distinct real numbers there exist infinitely 


many rational numbers and infinitely many distinct irrational num- 
bers. (Here, ‘infinitely many’ means that given any natural number n, 
there exist at least n numbers with the required property.) 


n 


Show that rs” - s, = a(r”"*! - 1) and deduce that 
a n+l 


Sn - — | = 
l-r 


forr #1. 
l-r 


For |r| < 1 deduce that s, > a/(1- r) as n > oo. 
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6. Prove that an infinite decimal x = do - a\a2a3 . . . is a rational number 
if and only if it is “eventually recurring’, that is, after some n onwards it 
repeats the same block of digits indefinitely. 


X= Ag : AL... An Any ~~ - An+k An+1 ++ An+k An+1 «++ Antik. ees 
a a V 


(Hint: One way round, use question 5 with a = ap+1 . . . an+k/10”** with 
r = 1/10") 


7. Let 
y = 0-1234567891011121314151617181920..., 


whose digits are the natural numbers in decimal form, strung end to 
end. Prove that y is irrational. 
Is 


0-101001000100001..., 


where each successive string of 0s has one more digit, rational or 
irrational? 


8. Say whether each of the following sequences (an) tends to a limit, and 
if so, what the limit is. Use the e - N definition to prove your answers 
are correct. 

(a) a, = n? 

(b) a, = 1/(n* +1) 

(c) dn =1+4434---+(4)" 
(d) ay = (-1)" 

(e) a, = (-3)" 


2 NUMBER SYSTEMS | 45 


PART Il 
The Beginnings of Formalisation 


The next five chapters develop the techniques we need to place mathematical 
reasoning on a firmer logical basis. We still permit the use of our intuitive 
ideas, but now only as motivation for the concepts introduced, and no longer 
as an integral part of the reasoning. 

In the ‘building’ metaphor, we are getting together the bricks, cement, 
timber, tiles, pipes, and other materials, and assembling a workforce of brick- 
layers, plasterers, joiners, and plumbers to put them together in the right 
way. In the ‘plant’ metaphor, it is a question of flowerpots, stakes, forks, and 
trowels, and a good stock of insecticide to keep the bugs off. 

We concentrate on two main ideas: the use of set theory as a source of 
raw material, and the use of mathematical logic to ensure that the proofs 
of theorems are rigorous and sound. There are three chapters on sets and 
related topics, followed by two on logic. We approach both from the point 
of view of a practical mathematician who is more interested in using them to 
do mathematics than in their own internal workings. 


CHAPTER 3 


Sets 


n accordance with the point of view stated in chapter 1, we make no at- 

tempt to give a precise definition of the concept ‘set’. This will not prevent 

us from explaining what a set is. A set is any collection of objects what- 
soever. The word ‘collection’ is not intended to imply anything about the 
number of objects in the set: it may be finite or infinite; there may be just one 
object, or even none. Nor is there any intention to imply any uniformity in 
the type of object used to make up the set: a perfectly good set might consist 
of three numbers, two triangles, and a function. Obviously such a broad con- 
cept allows vast scope for whimsical examples. However, the sets of interest 
in mathematics consist only of mathematical objects. At an elementary level 
we encounter sets of numbers, sets of points in the plane, sets of geometrical 
curves, sets of equations. In more advanced mathematics, there is an enor- 
mous variety of sets; in fact almost all the concepts of interest are built up 
from a set-theoretical standpoint. 

Nowadays the concept ‘set’ is considered to be fundamental to the whole 
of mathematics—even more fundamental than the concept ‘number’, which 
earlier ages plumped for. There are many reasons for this. One is that the 
solution of equations usually yields a set of solutions, rather than just one; 
quadratic equations, for example, usually have two solutions. Again, modern 
mathematics places emphasis on generality. Interesting theorems tend to 
apply to a variety of cases. Pythagoras’ theorem is important, not because it 
applies to one particular right-angled triangle, but because it applies to all of 
them. It thereby expresses a property of the set of all right-angled triangles. 
The concept of a ‘group’ (which we describe later, particularly in chapter 13) 
appears in many guises throughout the whole of mathematics. The language 
of sets helps us formulate general properties of a group, which therefore 
apply to all of its realisations. It is this power of expression of general 
concepts using set-theoretic language that gives modern mathematics its 
distinctive flavour. 
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To deal with all of the sets that arise in mathematics, it is easiest to develop 
first those general properties common to all sets, and then to apply them in 
more special situations. For the rest of this chapter we concentrate on various 
natural ways to combine and modify sets to form other sets. The systematic 
study of these methods leads to a kind of ‘algebra’ of sets, in the same way 
that a systematic study of the general properties of numbers and operations 
such as addition, subtraction, multiplication, and division leads to an algebra 
of numbers. 


Members 


The objects that together make up a given set are called the members or elem- 
ents of the set. The members themselves are said to belong to the set. To 
express symbolically that an element x belongs to a set S, we write 


xes. 
If x does not belong to S, we write 
xs. 


In order to know which set is under consideration, we must know exactly 
which objects are members. Conversely, if we know the exact membership, 
we know which set the members form. Being pedantic about this is not as 
silly as it might seem, because we often describe the same set in different 
ways; we can be sure that we are dealing with the same set by looking at its 
members. For example, if A is the set of solutions of the equation 


x? -6x+8=0, 


and B is the set of even integers between 1 and 5, then A and B both have 
precisely two members, 2 and 4. This means that A and B are the same set. 
It is sensible, therefore, to say that two sets are equal if they have the same 
members. Equality of two sets S and T is expressed in the usual way by 


S=T, 
and if S and T are not equal, we write 
S#T. 


This apparently trite criterion for equality of sets has some interesting 
consequences, as we see in a moment. 

The simplest way to specify a set is to list its members (if that is possible). 
The standard notation is to enclose the list in braces { }. So 


S = {1,2,3,4,5, 6} 
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means that S is the set whose members are the numbers 1, 2, 3, 4, 5, 6, and 
only these. As another example, if 


T = {79, 2°, (5+ /7), $ 


then the members of T are the numbers 79, x°, /(5 + ./7), and $. 

Two features of this notation should be emphasised; both are conse- 
quences of our notion of equality of sets. First, it is immaterial in which order 
we write the list of members. The set {5, 4, 3, 2, 6, 1} is the same as the set S 
above, and so is the set {3, 5, 2, 1, 6, 4}. Why? Because in all three cases, we 
have the same members, namely 1, 2, 3, 4, 5, and 6. The order within the 
braces arises not from any mathematical cause, but from our conventions 
about writing from left to right. Second, if elements are repeated in the list, 
this does not alter the set either. For instance, {1, 2, 3, 4, 6, 1, 3, 5} is just 
our old friend S again. Once more, there is a reason for this seemingly pe- 
culiar convention. We might combine two lists to give the set consisting of, 
say, all the proper divisors of 12, namely 1, 2, 3, 4, 6, together with the odd 
numbers less than 6, which are 1, 3, 5. Just writing one list after the other 
gives precisely what we have written. In this case it would have been quite 
easy to go through and cross out repeats, but in general, it is better to retain 
flexibility of notation and allow repeats. Our convention about a set being 
specified by its members implies that all of the various ways of specifying S 
have precisely the members 1, 2, 3, 4, 5, 6 and no others. 

These peculiarities of notation have no great conceptual significance. We 
are used to the fact that when writing fractions, we can get different symbols 
for the same number: } = ¢ = 2 and so on. In fact this is one of the most 
common usages of the equality sign: when we write x = y we mean that the 
two symbols on either side of the sign are two different names for the same 
thing. For instance, 2 +2 = 12+3=5-1= +./16 = 4. We use the same 
convention when we write S = T for equality of sets. Having understood 
this, there is no essential difficulty here; we have just raised these questions 
in order to dispose of them. 

When specifying a set, it may not be convenient, or even possible, to write 
down a complete list of all the members. The set of prime numbers is better 
described precisely by that phrase, rather than by the list 


{2,3, 5,7, 11, 13, 17,19,...}. 
Indicating a few terms of an infinite set in this manner is open to the same 


sort of misinterpretation as writing the first few terms of a sequence, only 
slightly worse. A sequence is thought of in order, but according to our 
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conventions about sets, the elements inside braces are not in any specific 
order. So the list above might also be written 


{7,17, 37,47, 2, 11, 3, 5,044. 


Who could sort out this jumble and say, with hand on heart, that this is the 
set of all primes? We admit that there are occasions when mathematicians 
do use the bracket notation for infinite sets. We do so ourselves sometimes. 
In such cases, it is always clear what is intended. 

In the given case we could be more precise by writing 


P = {all prime numbers}, 
which is self-explanatory. A slight variation on this, which is very useful, is 
P = {p |p is a prime number}. 


Here the braces are read as ‘the set of all ...’, the vertical line as ‘such that’, 
and the whole symbol reads ‘the set of all p such that p is a prime number’, 
which obviously means ‘the set of all prime numbers’. In general a definition 
of the type 


Q = {x| something or other involving x} 


means that Q is the set of all x for which the something or other involving x 
is true. 

To see how useful this notation is, suppose we want to define S to be the 
set of solutions of the quadratic equation 


x? -5x+6=0. 


We could, of course, solve, and define S = {2,3}. Much easier, since it avoids 
solving the equation, is to write 


S= {x|x* -5x +6 = 0}. 


This gives a precise and unequivocal definition of S. Of course it is no help 
in solving the equation! But that is the point of the whole exercise: we can 
specify the set S without actually doing any calculations. 

There is room for ambiguity in this notation. If we are thinking about 
integers only, then the set 


{x|1 <x <5} 


consists of the numbers 1, 2, 3, 4, 5. But if we are thinking about real num- 
bers, all the other real numbers between 1 and 5 are included as well. The 
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best way out of this impasse is to specify a set Y from which the elements are 
to be chosen. The notation 


X = {x € Y | something or other involving x} 


means that X is the set of those members x of the given set Y such that 
something or other involving x is true. This is the same as 


X = {x|x € Y and something or other involving x}, 


but the first notation is preferable because it emphasises the role of Y. 
If Z is the set of integers and R the set of real numbers, then 


{x € Z| 1 <x <5} 


has members 1, 2, 3, 4, 5, while every a € R satisfying 1 < a < 5isa 
member of 


{xe R|1<x <5}. 


There is an even more serious reason for specifying a set Y from which the 
members of the set X are chosen: to make sure that the ‘something or other 
involving x’ makes sense for all x € Y. The ‘something or other’ needs to be 
a property that is clearly true or false for every x € Y. Then the set X selected 
by this property comprises those members of Y for which the property is 
true. 

In English grammar, a sentence is divided into two parts: the subject of the 
sentence, and the rest, called the predicate, which tells us about the subject. 


The Moon is a satellite of the Earth 
ig AX J 


subject predicate 


Because he defied the waves, King Canute got his feet wet 
\ AN AN Pi 
subject 


predicate 


Fig. 3.1 Subject and predicate 


A mathematician who customarily uses a symbol like x to denote an un- 
known might say that the predicate in the first sentence is 


x is a satellite of the earth 
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and the predicate in the second is 
Because he defied the waves, x got his feet wet. 


The beauty of this description is that it specifies the position of the subject 
in the sentence. To get the original sentence back again, we simply substitute 
the appropriate subject in place of x. 

This motivates the mathematical definition: 


Definition 3.1: A predicate is a sentence involving a symbol x so that when 
we substitute an element a € Y for x, the resultant statement is clearly either 
true or false. We say that the predicate is ‘valid for the set Y’ if this is so. 

For instance, the sentence 


1<x<5 


is a predicate which is valid for the set Z. It is also valid for the set R. Substi- 
tute any integer or real number and we get a statement that is either true or 
false. 


1 < 3 < 5 is true, 


1 < 57 < 5 is false, 


and so on. 

The set {x € Z| 1 < x < 5} is just the set of x € Z such that the predicate 
1 < x < 5is true. 

A predicate need not be restricted just to sets of numbers. For instance, if 
T is the set of triangles in the plane, then the sentence 


x is right-angled 
is a predicate valid for the set T and 
{x € T |x is right-angled} 


is just the set of right-angled triangles in the plane. 

We could go on giving examples galore of predicates, but plenty will turn 
up in the text anyway. The reader should make it clear in their own mind 
that whenever the symbolism 


{x € Y|P(x)} 


is used, then P(x) is a predicate in x which is valid for all x € Y. 
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Subsets 


Within any given set A there exist other sets, obtained by omitting some of 
the elements of A. These are called subsets of A. More formally: 


Definition 3.2: B is a subset of A if every element of B is an element of A. 
We write 


BCA 
or 
ADB. 


We also say that B is contained in, or included in, A and that A contains or 
includes B. With this definition we have A C A, for trivial reasons. If B C A 
and B # A then we say that B is a proper subset of A and write 


BGA. 


Many mathematicians use C where we have used C and others write C where 
we have chosen = We use C because it is unambiguous. 
The criterion for equality of sets leads to a trivial but useful result: 


Proposition 3.3: Let A and B be sets. Then A = B if and only if A C Band 
BCA. 

Proof: If A = B then, since A C A, it follows that A C Band B C A. 
Conversely, suppose that A C B and B C A. Then each element of A is an 
element of B, and each element of B is an element of A. Hence A and B have 
the same elements, so A = B. 


In practice this proposition is used to prove equality of two sets when each is 
defined by a predicate. We start with a typical element in A (given in terms 
of the appropriate predicate) and show that this element is also a member 
of B. This verifies that A C B. Then we carry out a similar argument to 
show that B C A. We will see plenty of examples of this procedure soon (in 
propositions 3.8, 3.9, and 3.10, for instance). 

A basic property of subsets is that a subset of a subset is itself a subset: 


Proposition 3.4: If A, B, Care sets with A C Band B C C, then A C C. 


Proof: Every element of A is an element of B and every element of B is an 
element of C. Therefore every element of A is an element of C, so A C C. 
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WARNING: It is important not to confuse subsets and members: the two con- 
cepts are quite different. The members of {1, 2} are 1 and 2. The subsets of 
{1, 2} are {1, 2}, {1}, {2}, and a fourth subset which for the moment is best 
written as { }. 

Further, proposition 3.4 becomes false if we change ‘C?’ to ‘€’. Mem- 
bers of members need not be members. For example, let A=1, B= {1, 2}, 
C = {{1, 2}, {3,4}}. Then A € Band B e C. But the members of C are {1, 2} 
and {3, 4}, so A = 1 is not a member. 

Now let us return to that set { }. 


Definition 3.5: A set is empty if it has no members. 


For instance, the set 
{xe Z|x=x+1} 


is empty, because the equation x = x + 1 has no solutions in Z. 

An empty set has remarkable properties (remarkable at first sight, that is) 
by default. For instance, if E is an empty set and X is any set whatsoever, 
then E C X. Why? We have to show that every element of E is an element of 
X. The only way that this can fail is if E has some element e which does not 
belong to X. But E, being empty, has no elements at all, so cannot contain 
any such element. 

This (curious but logical) argument is an example of ‘vacuous reasoning’, 
because it discusses properties of something that does not exist. Vacuous 
reasoning is rarely encountered in everyday argument. However, for math- 
ematicians, it has a unifying feature that allows logical arguments to be used 
in cases where everyday intuition may not apply. Really, we are discussing 
properties that something would have if it did exist, with the aim of obtain- 
ing a contradiction. Then we conclude that it does not exist. So it is useful to 
allow statements about non-existent objects. 

For instance, suppose we have two empty sets E and E’. The above tells us 
that E C E’ and E’ C E. Then proposition 3.3 tells us that E = E’. All empty 
sets are equal. Hence there is a unique empty set. We therefore give it a special 
symbol: we write 


to denote the empty set. 

This is hardly surprising. In the absence of any elements whatsoever, we 
have no way to distinguish two empty sets. In the words of [31]: ‘the contents 
of two empty paper bags are equal’. 
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Is There a Universe? 


Just as there is an empty set Ø that contains no elements, we might ask 
whether there is a very large set Q that includes absolutely everything. This 
turns out to be far too fanciful. Such a set would have to be an incredibly vast 
rag-bag; if it contained everything it would include all numbers, all elements 
of every set, all sets, all places in the universe, the Declaration of Independ- 
ence, Winston Churchill, the year 1066, the wit of Oscar Wilde, . . . If we dare 
contemplate such an Q, then itself must be an acceptable concept and we 
would have to include it in the collection of absolutely everything. So Q € Q. 
Most sensible sets do not belong to themselves; in fact you could while away 
an interesting half an hour trying to find such a set. 

However, there is a nastier problem. If we select from the putative set Q 
the subset comprising everything that is a set but does not belong to itself, 
we get: 


S={AeE QA £A} 
Now ask the key question: is S € S? 


If S € S, then, according to the defining predicate, S ¢ S. 
If S € S, then S satisfies the defining predicate, so S € S. 


Our flight of fancy in assuming the existence of a universe (2 has led to a 
paradox. Therefore there cannot be a universal set. 

Can we salvage the situation by removing all the whimsical things and 
concentrating on a universal set for the realm of mathematics? No: this too 
has its pitfalls. If we try to contemplate a set Qm of all mathematical ob- 
jects (whatever that means), then we reach the same contradiction when we 
consider the subset of Qm consisting of all mathematical objects that do not 
belong to themselves. 

To avoid such paradoxes, it is essential for the sets we consider to be clearly 
defined, in the sense that in principle we know precisely which objects are 
members and which are not. 

The non-existence of a universal set is another reason why the notation 


{x € Y|P(x)}, 
where Y is a known set and P(x) is a predicate, is preferable to 
{x | P(x)}. 


Having specified the set Y, we can investigate the predicate P(x) and make 
sure it is valid for all elements of Y before selecting those elements in Y for 
which the predicate is true. Used indiscriminately, the notation {x | P(x)}, 
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which allows us to try absolutely any object x to see if it is a member, is like 
considering {x € Q | P(x)}. But there is no universal set Q. If we don’t specify 
a set Y, then we have unlimited choice of objects x to try in P(x). We might 
consider an element that had not been intended at the outset, and end up 
with a paradox again. 

Here is an example. If Z is the set of integers, R the set of real numbers, 
and T the set of all triangles in the plane, then Z ¢ Z,R ¢ R,T € T. If Y is 
the set whose members are Z, R, T, then 


{xe Y|x €x}={Z, R, T}. 


On this set Y, the property x ¢ x is a perfectly acceptable predicate. However, 
if we consider 


{x |x € x} 


with no restrictions on x, imagination can run riot. Considering 
S = {x|x € x} itself, we end up with the same contradiction as before: S € S 
ifand only if S € S. 

The moral is that set theory is a system of notation, not a magical pre- 
scription. As such, it is as good as the manner in which it is used. When used 
sensibly, it behaves well. But if you use it badly, things can go wrong. 


Union and Intersection 


Two important methods for combining sets are known as the union and 
intersection. 


Definition 3.6: The union of two sets A and B is the set whose elements are 
those of A together with those of B. We write A U B to denote the union of A 
and B. Now 


AUB = {x|x € A orx €B (or both)}. 


For example, if 
A= {1,2, 3} 
B = {3,4,5} 


then the union is {1, 2, 3, 4, 5}. 


Definition 3.7: The intersection of A and B is the set whose elements belong 
both to A and to B. The symbol for the intersection is A N B. In this case, 


ANB= {x|x € Aandx € B}. 
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For example, with A and B as above, their intersection is {3}, because only 3 
belongs to both of them. 
The intersection can also be written as 


ANB= {x E€A|xe Bh, 


so we can think of it being the subset of A selected using the predicate x € B. 
(Equivalently we could think of it as being the subset of B which satisfies 
the predicate x € A.) The union, on the other hand, involves constructing 
a new set which is (usually) bigger than both A and B, so here we have an 
example of a set construction that does not select elements from a previously 
prescribed set Y. 

The operations of union and intersection obey certain standard laws. Most 
of them are obvious, but for convenience we list them in the next three 
propositions. 


Proposition 3.8: Let A, B, C be sets. Then 


(a) AUØ=A 

(b) AUA=A 

(c) AUB=BUA 

(d) (AUB)UC=AU(BUC). 


Proof: Only (d) is remotely difficult, so we leave the first three as an exercise. 
Before trying them, however, read the proof of (d). 

Suppose that x € (AUB)UC. Then either x € AUB or x € C.Ifx € C, then 
x € BUC, so x € AU(B U C). Ifnot, then x € AUB, so either x € A or x € B. 
In either case x € A U (BU C). So we have proved that if x € (AUB) UC 
then x € A U (B U C), that is: 


(AUB)UCCAU(BUC). 
A similar argument shows that 


AU(BUC) € (AUB)UC. 


Using proposition 3.3 we obtain equality. 


This proof is more complicated than the situation really warrants, because it 
is obvious that (A U B) U Cis the set whose members are those of A, those of 
B, and those of C together. Clearly this is the same set as A U (B U C). Once 
we know this, it is possible to omit the brackets altogether and write just 


AUBUC. 
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Similar results hold for intersections: 


Proposition 3.9: 


(a) AND =Ø 

(b) ANA=A 

(c) ANB=BNA 

(d) (ANB)NC=AN(BNC). 


The proofs are analogous to those in proposition 3.3. 


Finally, there are two equations that mix up unions and intersections: 


Proposition 3.10: 


(a) AU(BAC)=(AUB)NA(AUC) 
(b) AN(BUC) = (ANB) U(ANC). 


Proof: Let x € A U (BNC). Then either x € Aorx e BNC. Ifx eA 
then certainly x € AU Band x € AUC, hence x € (AUB) (AUC). 
Alternatively, x € B N C gives x € B and x € C. Hence x € AU Band 
x € A UC, sox € (A U B) N (AUC). This proves that 


AU(BNC) € (AUB)N(AUC). (3.1) 


Conversely, suppose y € (AU B) N (AUC). Then y € A U Band y € AUC. 
There are two cases to consider: when y € A and when y ¢ A. If y € A, then 
certainly y € A U (B U C). On the other hand, if y ¢ A then, since y € A U B, 
we must have y € B; similarly y € C. Thus y € B N C, which again implies 
y € A U (BN C). Therefore 


(AUB)N(AUC) CAU(BNC). 


Together with (3.1), this yields the desired result. 
The proof of (b) is analogous. 


Proposition 3.10 is a pair of ‘distributive laws’, which should be compared 
with the way that multiplication of numbers is distributive over addition: 


ax (b+c)=(ax b)+ (ax). 


With numbers, however, the interchange of the two operations does not give 
a new rule: 


a+ (b xc) = (a+b) x (a+c) 


is not true in general. 
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The operations U and on sets behave in a much more symmetrical way: 
each is distributive over the other. 

One way to visualise these various set theoretic identities is to draw Venn 
diagrams. The identity 


AU(BNC) =(AUB)N(AUC) 


can be represented by drawing three overlapping discs, supposed to repre- 
sent the sets A, B, C, and proceeding as follows: 


way 
y 


Fig. 3.2 Three overlapping sets 


BN Cis the shaded region common to B and C: 


wy 


Fig. 3.3 BOC 


and the union of this with A is: 


Wy 
y 


Fig. 3.4 AU (BNC) 
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On the other hand, A U B is: 


Fig. 3.5 AUB 


and A U Cis: 


Fig. 3.6 AUC 


so (A U B) N (A U C) is the region common to both, which is: 


Fig. 3.7 (AU B)N (AUC) 


This is the same as before. 
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You may wish to try your hand at illustrating the other identities in pro- 
positions 3.8, 3.9, and 3.10 by drawing Venn diagrams. Such pictorial devices, 
if well chosen, help most people to get a coherent idea of what is going on. To 
obtain the most general picture, the diagram must be drawn with care. With 
one set A, there are two distinct regions involved, inside A and outside A: 


A 


inside A outside A 


Fig. 3.8 Inside and outside a set 


With two sets A and B there are four regions, (1) outside both, (2) inside A 
but not B, (3) inside B but not A, and (4) inside both of them: 


Fig. 3.9 Two sets, four regions 


With three sets, A, B, and C, there are eight regions: 


Wy 
y 


Fig. 3.10 Three sets, eight regions 


If we add a fourth set D so that D meets each of these eight regions and the 
area outside D meets each of the regions, then we get sixteen regions in all. 
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There is no way that this can be achieved if A, B, C, and D are all drawn as 
circles. Try to draw a fourth circle in the last diagram above to meet this 
prescription, and you will see what we mean. It can be done, but not with a 
circle: 


Fig. 3.11 Four sets, ... 


Many more elegant diagrams, which in principle can handle any finite num- 
ber of sets, have been devised (see [2]). Venn himself was aware of such 
limitations when he first drew the diagrams. They can be ironed out by us- 
ing more complicated shapes to represent the sets. This problem illustrates 
the need to shift from the use of pictures as an aid to mental processes, to a 
general proof that works in all cases in a manner analogous to those in pro- 
positions 3.8, 3.9, and 3.10. This is an important aspect of the journey in this 
book, from intuitive beginnings that may be imagined as pictures, to formal 
definitions and proofs that work in general. Initially, proofs should be verbal- 
ised in terms of definitions, to establish relationships that can be used with 
confidence in new settings. 
There is a general connection between unions, intersections, and subsets: 


Proposition 3.11: If A and B are sets, the following are equivalent: 


(a) ACB 
(b) ANB=A 
(c) AUB=B. 


Proof: Equation (b) says that the elements common to A and B are all the 
elements of A, so every element of A belongs to B, which implies A C B. The 
converse is obvious, so (a) and (b) are equivalent. 

Equation (c) says that if we add to B the elements of A, we still get B. There- 
fore no element of A can fail to belong to B, and again A C B. The converse 
is once more obvious, so (a) and (c) are equivalent. 


64 | 3 SETS 


Complements 
Let A and B be sets. 


Definition 3.12: The set-theoretic difference A\B is defined to be the set of 
all those elements of A that do not belong to B. In symbols, 


A\B= {x € A|x € Bh. 


In a Venn diagram, A\B is the shaded region in: 


Fig. 3.12 The set-theoretic difference 


If B is a subset of A, then we call A\B the complement of B relative to A. 


Fig. 3.13 The complement of B relative to A 


It would be nice to forget about A entirely, thus defining the complement of 
B to consist of everything not belonging to B. However, this is too much to 
ask, because it would mean that B and its supposed complement would make 
up a set Q which contains absolutely everything, and we have already shown 
that such a set cannot exist. 

In a particular piece of mathematics, however, there may be a set U that 
includes all of the elements that we wish to consider. We call this set the uni- 
verse of discourse or universal set (universal, that is, for current purposes). 
When dealing with the set of integers, for example, we might take the uni- 
versal set to be U = Z. Of course, U = R would do just as well. The important 
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thing is that the universal set should be sufficiently all-embracing to include 
all of the elements under discussion. As one of us says elsewhere, ‘In a dis- 
cussion about dogs, when thinking about all non-sheepdogs, it is pointless to 
worry about camels’. 


Definition 3.13: Having agreed upon U, we define the complement B® of 
every subset B of U by 


Bo = U\B. 


Thus B° is the complement of B relative to U. But because U is agreed upon, 
we can omit it from the notation, which is the object of the exercise. 
Of course, the operation € obeys some simple laws. They include: 


Proposition 3.14: If A and B are subsets of the universal set U, then 


(a) =U 
(b) U= Ø 
(c) (A5 =A 


(d) IfA C B then AS D BS. 


In view of (c) we can write A“ = (A‘)° = A. Less elementary, but highly 
interesting, are: 


Proposition 3.15 (De Morgan’s Laws): If A and B are subsets of the 
universal set U, then 

(a) (A U B) = A N BS, 

(b) (ANB =A UB, 
Proof: Let x € (AU B)“. Then x ¢ A U B. This implies that x € A and 
x ¢ B, so x € AS and x € BS, so x € ASMB*. Therefore (A U B) C ASN BS. 
To obtain the reverse inclusion, reverse the steps in the argument. This 
proves (a). 

Equation (b) can be proved similarly. Alternatively, we can replace A by 
A‘ and B by B° in (a), which gives 

(AU BS) = ASN BY = ANB. 
Taking complements, 
AS U BS = (ASU B°)“ = (ANB), 


and this is (b). 


These laws explain a phenomenon that the alert reader may have observed 
already: set-theoretic laws come in pairs, so that if we start with one and 
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change all unions to intersections and all intersections to unions, we obtain 
another. We could formulate this as follows: 


Theorem 3.16 (De Morgan Duality Principle): If in any valid set- 
theoretic identity involving only the operations U and N the operations U 
and N are interchanged throughout, the result is another valid identity. 


Proof: To prove this in general is not hard, but it needs an induction 
argument. The following example is a typical case. Start with the identity 


AU(BNC)=(AUB)N(AUC). 
Take complements of both sides and use De Morgan’s laws to get 
ASN (BNC) = (AUB) U(AUCS, 
then use De Morgan again to get 
AS (BSUC*) = (ASN BY) U(ASNC). 
Already we have interchanged U and N. Now systematically replace A by 


A‘, B by BS, C by C°. Since the equation is true for any sets A, B, C, this is 
legitimate. We get 


AN(BUC) =(ANB)U(ANC). 


This is the original law, with Us and Ns interchanged. 


QuESTION: How does the presence of the operation € affect the argument? 
(Try the identity 
BU(ANA‘)=B 


and use the same approach. What happens?) 


Sets of Sets 


It may happen that all the elements of a given set S are themselves sets. In- 
deed, it is often a useful device to consider a set of sets. For instance, we may 
have S = {A, B} where A = {1, 2}, B = {2, 3, 4}. A more sophisticated example 
is to take any set X and let P(X) be the set of all subsets of X. This is called 
the power set of X and satisfies the property: 


Y € P(X) ifand only if Y C X. 
For example, if X = {0,1}, then P(X) = {@, {O}, {1}, {0, 1}}. In cases 
like these, where every member of S is itself a set, we can go a level fur- 


ther and consider the elements belonging to these members. This gives us 
generalisations of the notions of union and intersection: 
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(JS = {x|x € A for some A € $} 
(S = {x |x € A for every A € S}. 


The taller versions of the symbols remind us that these are related to the 
operations U and N but now apply to sets of sets. We call |_J S the ‘union of S 
and (|S the ‘intersection of S’. Put into words, the union of S consists of all 
the elements in the members of S and the intersection of S consists of those 
elements common to all members of S. For instance, 


(JHL 2}, {2,3,4} = {1, 2,3, 4} 
[MHL 23, £2, 3, 4} = {2}. 
In general, for any set X, 
LJ Px) =x 


N PCO) = @. 


Although this notation may seem a little strange at first, it is extremely 
economical and it does act as a genuine extension of the usual concepts. For 
instance, given two sets Aj, A2, let S = {Aj, A2}. Then 


| Js=A. UA, 
(QS =41 NA. 


More generally, 
(Jii, Az... An} = 41 UAU... UAn 
[MAn Az- An} = Ar NAN... N An: 


Alternative (and much more used) notations for these last two concepts are 


AUAU ...U An =|_]4 
r=1 


ANAN... N An =) Ay. 
r=1 


We return to generalised unions and intersections at the end of chapter 5. 


Exercises 


1. Which of the following sets are the same? 
(a) {-1,1,2} 
(b) {-1,2, 1,2} 
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10. 


(c) {ne Z| |n| <2andn #0} 
(d) {2,1,2,-2,-1, 2} 
(e) {2,-2} U {1,-1} 
(£) {-2,-1,1,2}N {-1,0, 1, 2, 3}. 
Prove that for all sets A, B, 
(A\B) U (B\A) = (A U B)\ (A N B). 

If A is the set of even integers, and B is the set of integers that are 
multiples of 3, describe (A\B) U (B\A). 
Write out the proofs of propositions 3.8(a), 3.8(b), 3.8(c), and all of 
proposition 3.9. Draw Venn diagrams to illustrate these results. 
Draw a Venn diagram suitable for all formulas involving five different 
sets. 


. If S = {all subsets X C Z such that 0 € X}, find NS, US. 


If S = S1 U Sz, prove that [J S = (U S1) U (U $2). 

If A has n elements (n € N), calculate the number of subsets of A. If 
you are acquainted with proof by induction, prove your result by this 
technique. 


If A, B, C are finite sets and | A| denotes the number of elements in A, 
show that 


JA U BU C| = |A|+|B|+|C|-|A A BI-|BNC|-|CN A|+|ANBNC|. 


Draw a Venn diagram. 


In each of the following statements, if we replace S by one of N, Z, 
Q, R, then we get a true statement. Find the appropriate set in each 
case: 
(a) {x Ee S| $ =5 ZØ 
(b) {xeS|-1<xx<1}={1} 
(c) {xe S|2 <x? < 5} {x € S|x > 0} = {-2} 
(d) {xe S|1 <x <4} = {x € S|x? = 4} U {3,4} 
(e) {x € S| 4x2? = 1}\fx € S|x < 0} = 
{x € S| 5x? = 3} U {x € S|2x = 1} ž Ø. 

The equation x + y = z has many solutions x, y,z € N; the equation 
x? +y? = z? has solutions including x = 3,y = 4,z = 5. 

Let F = {n € N | x” + y" = z” has a solution where x, y,z € N}. 

What must be done to show F = {1, 2}? What does this tell us about 
verifying equality between sets in general? 
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CHAPTER 4 


Relations 


he aim of this chapter is to introduce one of the most important 

concepts in set theory. The notion of a relation is found through- 

out mathematics and applies in many situations outside the subject 
as well. Examples of relations involving numbers include ‘greater than’, ‘less 
than’, ‘divides’, ‘is not equal to’; examples from the realms of set theory in- 
clude ‘is a subset of’, ‘belongs to’; examples from other areas include ‘is the 
brother of’, ‘is the son of. What all these have in common is that they refer 
to two things, and the first is either related to the second in the manner de- 
scribed, or not. Thus the statement a > b, where a and b are integers, is 
either true or not true (2 > 1 is true, 1 > 2 is false). 

The two things that are related must be taken in a specific order, for in- 
stance the statement a > b is quite different from b > a. So the first thing we 
do in this chapter is to set up some machinery about ordered pairs. 

Relations can occur between elements in different sets; that is, we can have 
a relation between elements in a set A and those in a set B. Most of the ex- 
amples we mentioned concern objects from the same set, but we slipped one 
into the set-theoretic list which was ‘belongs to’. If A is a set of elements and B 
is a set whose members are themselves sets, then we can determine whether 
x € Y for each member x € A and each member Y € B. Since x € Y is 
either true, or not true, for every x € A and Y € B, this defines a relation be- 
tween A and B in the sense to be described in this chapter. The beauty of the 
description given is that it can be formulated entirely in set-theoretic terms. 

In the latter part of the chapter we develop a detailed theory of two 
particularly important types of relation: equivalence relations and order 
relations. 


Ordered Pairs 


We have said that for sets, the order in which we write the elements in a list 
makes no difference, so that for a set with two elements, {a, b} = {b, a}. This 
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is all very well, but there are occasions on which it is essential to distinguish 
the order. For instance, in coordinate geometry we think of all the points of 
the plane as being represented by pairs (x, y) of real numbers. The order is 
crucial; for example the points (1, 2) and (2, 1) are different: 


Fig. 4.1 Order matters 


We are thus led to the concept of an ordered pair (x, y), round brackets 
being used to make a distinction from {x, y}. The important property that 
we require of this new concept is: 


The Ordered Pair Property: 
(x, y) = (u, v) if and only if x = uand y = v. (OPP) 


This notion is used throughout set theory. 

This is all very well; the only problem is that we haven’t actually said pre- 
cisely what we mean by an ordered pair. What is (x, y)? IfA = B = R, then 
we can think of an ordered pair (x, y) as a point in the coordinate plane, us- 
ing Cartesian coordinates. This, indeed, is where the notion of ordered pair 
arose. In this sense we can refer to the plane as R x R (or, in more usual 
mathematical shorthand, as R°). But what happens if A is a set like {apple, 
orange, grapefruit} and B is {knife, fork}, what then is A x B? It certainly 
consists of the ordered pairs: 


(apple, knife), (apple, fork), (orange, knife), (orange, fork), 
(grapefruit, knife), (grapefruit, fork). 


However, that doesn’t answer the question: what is the ordered pair 
(apple, knife)? 

The solution lies not in ‘what is it?’, but in ‘how do we get it?’. The answer 
is that to obtain (x, y) in general, we first select x from A, then we select y 
from B. That’s all. 
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The mathematician Kasimierz Kuratowski saw in this process a possible 
abstract definition of (x, y) using only set-theoretic notions that we have al- 
ready described. Having selected x from A, we have the singleton set {x}; then 
selecting y from B we arrive at the set {x, y}. Kuratowski defined the ordered 
pair (x, y) to consist of these sets: 


Definition 4.1 (Kuratowski): The ordered pair (x, y) of two elements x, y 
is defined to be the set 


(x, y) = {ix}, {x, yt} 


Notice that we get a set here. This peculiar-looking definition has the 
advantage that it satisfies the ordered pair property (OPP): 


Proposition 4.2: With Kuratowski’s definition, 
(x, y) = (u, v) ifand only if x = u and y = v. 


Proof: If x = u,y = v, then the definition gives (x,y) = (u, v). In the other 
direction, suppose that (x,y) = (u,v). If x # y, then (x,y) = {{x} {x,y} 
has two distinct members, {x} and {x, y}, which must each belong to (u, v) = 
{{u, }, {u, v}}. This means that the members {u} and {u, v} must be different 
also, implying u ¥ v. Now we must have {x} = {u} or {x} = {u,v}, and the 
latter is clearly impossible (because it would mean that u, v both belonged to 
{x}, implying u = x = v, contradicting u # v). So {x} = {u} and x = u. In 
a similar fashion, {x,y} = {u,v}, and since x = u,x # yand y € {u,v}, we 
deduce that y = v. Thus x = u and y = v, as required. 
If x = y, the set-theoretic construction collapses somewhat to give 


(x,y) = {fx}, yh = tid, fad} = (eh, ft} = Cott, 


so (x,y) has only one member, namely {x}. If (x, y) = (u, v), then (u, v) has 
only one member also, implying {u} = {u, v}, so u = v and (u, v) = {{u}}. The 
equality (x, y) = (u, v) then becomes {{x}} = {{u}}, which reduces successively 
to {x} = {u} and then x = u. Thus this case reduces to x = y = u = vand the 
proof is complete. 


By being a little more sophisticated, we can prove this result much more 
quickly. In the notation of the last section of chapter 3, 


Mh tyh = Ox} and (Jih t yh = fx, y} 
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So N(x, y) = {x}, U(x, y) = {x,y}. If (x, y) = (u, v), then comparing intersec- 
tions and unions, {x} = {u}, {x, y} = {u, v}. The first gives x = u and from 
this (whether x = y or not), the second gives y = v. 

Where does this get us? First the good news: we have a definition of the 
ordered pair (x,y) involving only established set-theoretic concepts. Then 
the bad news: the definition does not correspond to the intuitive notion of 
ordered pairs in coordinate geometry. Indeed, if any mathematician were 
asked to visualise (2, 1), he or she would, like as not, think of it as a point in 
the plane; it is most unlikely that their thoughts would revolve around the 
idea {{2}, {2, 1}}. 

The pragmatic solution is to let Kuratowski’s definition fade into the 
background, safe in the knowledge that it is there should we ever be asked 
to give a rigorous foundation. The important notion is the ordered pair 
property (OPP). 

Here we meet a fundamental idea that underpins the whole of formal 
mathematics. What is important is not what a mathematical object is, but 
what its properties are. Formal mathematical concepts are specified by def- 
initions that state their required properties in terms of set theory. Other 
properties of a given concept are then deduced by mathematical proof as 
theorems. This principle has the powerful consequence that the theorems 
proved must be valid in any context that satisfies the specified definitions. 
This is true not only of situations that are familiar, but also in any situation 
we meet in the future where the definitions are satisfied. 


Mathematical Precision and Human Insight 


The situation occurring for ordered pairs happens throughout formal math- 
ematics. Essentially the same underlying mathematics can be expressed in a 
variety of different ways. For example we now have (at least) three differ- 
ent ways of thinking about an ordered pair (x, y) for elements x,y € S. We 
can represent it symbolically as (x, y), visually as a point in the plane (when 
S is the set of real numbers), and formally as the Kuratowski definition. All 
of these have the same property that (x,y) = (x’,y’) if and only if x = x’ 
and y = y’. When we think about ordered pairs in our everyday working, 
we almost always use a visual or symbolic representation rather than the 
Kuratowski definition. 

More generally, we often write the same thing in different ways. For in- 
stance, we can write the fraction 1/2 as 2/4 or 3/6 and say that all these 
fractions are ‘equivalent’. As processes of calculation, these fractions are dif- 
ferent, but they all produce the same result. There are many other instances 
where ‘equivalent’ things are essentially the same. For instance, the algebraic 
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expressions 3(x + 2) and 3x + 6 are different processes (‘treble the result of 
adding x and 2’ and ‘three times x plus 6’), but they have the same result. 
If we use the concept of a function, the two functions f(x) = 3(x + 2) and 
g(x) = 3x + 6 are the same function, according to the set-theoretic definition. 

As mathematics gets more sophisticated, we realise that various ways of 
thinking about a particular concept can be conceived as a single idea. The 
Greeks sought ‘the essence’ of mathematical concepts in arithmetic and 
geometry. For example, when considering a circle, they started from phys- 
ical circles with different locations and different sizes. From these examples 
they extracted a single Platonic object: the locus (now spoken of as the ‘set’) 
of all points in a plane that are equidistant from a given point, the centre. 

Likewise, equivalent fractions represent an underlying rational number, 
and equivalent algebraic expressions represent the same algebraic object 
written in different ways. 

Something that we can hold in our minds in various different ways, all 
of which essentially represent the same underlying idea, is called a crystalline 
concept (see [35]). The term ‘crystalline’ does not mean that the concept looks 
like a crystal with regular faces; it means that it has strong links that relate its 
properties in a coherent and inevitable way. For instance, the sum of two 
numbers is a crystalline concept in the sense that 2 + 3 is 5 in the context of 
our usual number notation, and that if we take 3 from 5 then the result can 
only be 2. In the same way, if we have a triangle drawn in Euclidean geometry 
that has two equal sides, then it must have two equal angles. Now we have one 
concept, isosceles triangle, defined in two ways. Not two differently defined 
concepts that happen to be equivalent. 

A ‘crystalline concept’ formulates how we think about sophisticated math- 
ematical ideas, rather than offering a formal definition of a mathematical 
concept. In the natural world, different concepts can have the same essen- 
tial structure. For example, there is a huge difference between 3 ducks each 
with 2 legs and 2 ducks with 3 legs. But in calculating the number of legs, the 
products 2 x 3 and 3 x 2 both give the same result. Similarly, taking away 
two $10 bills is different, as an operation, from adding two debts of $10, but 
the effect on your finances is the same. 

As we become more sophisticated, we do not say “-2 times -3 is equiva- 
lent to 2 times 3’; we say “(-2) x (-3) equals 2 x 3’. Formal mathematics 
takes these ideas to a higher level, defining the properties that a particular 
formal structure must have and deducing all its other properties by mathem- 
atical proof. Definitions that at one level are considered ‘equivalent’ concepts 
may be imagined at a higher level as a single crystalline concept. For in- 
stance, equivalent algebraic expressions become conceptualised as the same 
function. 
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The notion of ‘crystalline concept’ is a psychological term rather than a 
mathematical one, so you will not find it in current mathematics textbooks. 
However, it represents a breakthrough in cognitive psychology that enables 
us to think about mathematics in a more powerful way. We could plough on, 
always talking about ‘equivalence’ at every stage. However, it soon becomes 
apparent that it is more useful for our human minds to build ideas on the 
underlying crystalline concept. Mathematicians speak of this as ‘identifying’ 
equivalent concepts to create a single idea. This procedure will become more 
apparent in later chapters as the mathematics becomes more sophisticated. 


Alternative Ways to Conceptualise Ordered Pairs 


Definition 4.3: The Cartesian product A x B is the set of all ordered pairs: 
Ax B= {(x, y)|x €A, y € Bh. 


In the case of R x R, visualising ordered pairs as points in the plane remains 
a most useful one; it certainly satisfies the ordered pair property (OPP). This 
interpretation of A x B is also useful when A and B are subsets of R. For 
instance, if A = {1, 2,3}, and B = {5,7}, then A x B is the set 


{(1,5), (1,7), (2,5), (2,7), (3,5), (3,7)}. 
Thinking in terms of Cartesian coordinates, we can draw a picture: 


4 @(1.7) @(2,7) @ (3,7) 


5 @(15) @(25) @ (3,5) 
o—_-e_® 
1 2 A 3 


Fig. 4.2 The Cartesian product 


When A and B are not subsets of R this sort of picture is less appropriate, 
but it can still be useful. For example, if A = {a, b, c) and B = {u, v}, then 


A x B= {(a,v), (a, v), (b, u), (b, v), (c, u), (c, v)} 


and the structure is represented by 


u (aw) (bxv) (cw) 
B | AxB 
v (au) (bu) (cu) 
C a b c J A 


Fig. 4.3 A more general Cartesian product 
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In general, A x B # B x A. For example, with A and B as above, 
Bx A = {(u,a), (u, b), (u, c), (v, a), (v, b), (v, c)} 


which is not the same as A x B. However, the Cartesian product does obey 
some general laws: 


Proposition 4.4: For any sets A, B, C, 


(a) (AUB) x C=(AxC)U(BxC) 
(b) (ANB) x C=(Ax C)N(Bx C) 
(c) A x (BUC) =(A x B)U(A xC) 
(d) A x (BNC) =(A x B)N(A x C). 


Proof: All are easy, and the argument is similar in each case, so we prove 
only (a), leaving the remainder as an exercise. Let (u, v) € (A U B) x C. Then 
u € AUB,veC.SoueAorue B.Ifue Athen (u,v) € A x Cifue B 
then (u, v) € B x C. Either way, (u, v) € (A x C) U (B x C). Therefore 


(AUB)x CC (A x C)U(B xC). 


Now let x = (y,z) € (A x C) U (B x C). Either x € A x Cor x € B x C. 
In the first case, y € A and z € C. In the second, y € B and z € C, so 
x = (y, z) € (AUB) x C. This shows 


(A x C)U (B x C) C (AUB) xC. 


Putting the two parts together finishes the proof. 


This can be illustrated by the following diagram: 


(AUB) xC 


C AxC BxC 


AUB 
Fig. 4.4 Ax BAC=(AxB)N(AxC). 


Proposition 4.5: For all sets A, B, C, D, 
(A x B)N(C x D) = (ANC) x (BND). 
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Proof: Let x= (y, z) € (AxB)M (CxD). Then y € A,z € B, and y € C, z € D. 
Soy € AN C,z € BAND, sox € (ANC) x (BND). Hence 


(A x B)N(C x D) T (ANC) x (BND). 


Conversely let x = (y,z) € (ANC) x (BOD). Then y € A and y € C,z € B, 
and z € D, so x € (A x B) N (C x D). Therefore 


(ANC) x (BND) C(A x B)N(C x D). 


Pictorially: 
D 
CxD 
BoD AxB 
B (ANC) x (BAD) 
2 € 


ANC 


Fig. 4.5 The intersection of Cartesian products 


The same picture should make it clear why a theorem like this does not 
hold for unions in place of intersections. 
Having got ordered pairs, it is easy to go on to define ordered triples, 
quadruples, etc. by setting 
(a,b,c) = ((a, b), c) 
(a, b, c, d) = (((a, b), c), d) 
and so on. These are elements of repeated Cartesian products, defined by 
AxBxC=(AxB)xC 
AxBxCx D=((A x B) x C) x D. 
Later we find a better way to formulate the general concept of an ordered 
n-tuple 
(41, a2,- -» An) 


for any natural number n. At our present stage we can do this for any 
particular n by repeating the process used for triples or quadruples. These 
generalisations have similar properties to the main property (OPP) of pairs. 
For example, (a, b, c) = (u, v, w) if and only if a = u, b = v, c = w. The proof 
of this follows from repeated use of (OPP). 
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Relations 


Intuitively a relation between two mathematical objects a and b is some con- 
dition involving a and b that is either true or false for particular values of 
a and b. For example ‘greater than’ is a relation between natural numbers. 
Using the usual symbol > we have 


2>1 is true 
1>2 © is false 


3>17 is false 


and so on. The relation is some sort of property of the pairs of elements a, b. 
In fact we must use the ordered pair (a, b), since, for instance, 2 > 1 but not 
1>2. 

If we know for which ordered pairs (a,b) that a > b is true, then, to all 
intents and purposes, we have specified exactly what we mean by the relation 
‘greater than’. In other words, a relation may be defined by using a set of 
ordered pairs: 


Definition 4.6: Let A and B be sets. A relation between A and B is a subset 
RofA x B. 


If A = B we talk of a relation on A, which is a subset of A x A. 

This definition requires elucidation. For example, the relation ‘greater 
than’ on N is the set of all ordered pairs (a, b) where a,b € N and (in the 
usual sense) a > b. We might illustrate this set as follows: 


=. N U A ú 


Fig. 4.6 a> bfora,b € N 


If R is a relation between sets A and B, then we say that a € A and b € B 
are related by R if (a, b) € R. More commonly we use the notation 


aRb 
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to mean (a,b) € R. Then (a,b) ¢ R will be written apo. This allows some 
sleight of hand. If we denote the relation ‘greater than’ by the usual symbol 
>, then, letting R be >, we find that a > b (in the above sense) means the 
same as (a, b) € >, and this by definition means that a > b in the usual sense. 
On the other hand, if (a, b) ¢ > we write a > b, which again corresponds to 
normal usage. Thus we ‘recover the standard symbolism by an unscrupulous 
trick of notation. This is an excellent idea—at least, mathematicians seem 
pleased by it—and in future we use the a R b notation. 
We consider more examples. The relation > on N: 


FN UU A ù 


Fig. 4.7 a > bfora,b eN 


The relation = on N: 


Fe NU A ú 


Fig. 4.8 a= bfora,b Ee N 


In fact, the relation = on N is the set {(x, x) |x € N}. 
For a final example, let X = {1,2, 3, 4, 5,6} and let ‘|’ be the relation ‘is a 
divisor of’, so that a|b means “a is a divisor of b’. As a set of ordered pairs, 


| = {(1, 1), (1, 2), (1,3), (1, 4), (1, 5), (1, 6), (2, 2), (2, 4), 
(2, 6), (3,3), (3, 6), (4, 4), (5, 5), (6, 6)}. 
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In pictures: 


-Nv RUD 
e 
e 
o 
e 
o 
fe} 


Fig. 4.9 a|b for numbers up to 6 


Given a relation R between sets A and B, and subsets A’, B’ of A and B 
respectively, we can define a relation R’ between A’ and B’ by 


R = {(a,b) € R| ae A’ andbe BY. 
In fact, set-theoretically, 
R =RN(A' xB’). 


We call R’ the restriction of R to A’ and B’. As far as the elements of A’ and 
B’ go the relations R and R’ say the same thing. The only difference is that R’ 
says nothing about elements not in A’ and P’. 


Equivalence Relations 


The odd integers are those of the form 2n + 1 for an integer n, namely ..., 
-5, -3, -1, 1, 3,5, . . . and the even integers are those of the form 2n, namely 
..., ~4,-2,0, 2,4, .... In both elementary and advanced mathematics, the 
distinction between odd and even integers is often important. The set Z of 
all integers splits into two disjoint subsets 


Zoad = {all odd integers} 


Zeven = {all even integers}. 
We can summarise this statement as 
Zodd N Zeven = Ø, Zoda U Zeven = Z. 


There is another way to split Z into these two pieces, using a relation, which 
for the moment we call by the noncommittal name ‘~’. Define, for m, n € Z, 


m ~ nifand only if m - n is a multiple of 2. 


Then 
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all even integers are related by ~, 
all odd integers are related by ~, 
no even integer is related to an odd integer, 
no odd integer is related to an even integer. 


These statements are a consequence of some general properties of ~, and we 
shall analyse the situation in general to see what is required. 
Imagine a set X broken up into a number of disjoint pieces. 


Fig. 4.10 A set divided into disjoint pieces 


We can define a relation ~ by 


x ~ y ifand only if x and y are both in the same piece. 


ye 


these are related these are not related 


Fig. 4.11 Defining a relation 


Conversely, we can reconstruct the pieces from the relation ~: the piece to 
which x € X belongs is 


E; ={y € X|x ~ y}. 


If we try this with a different relation ~, all sorts of things can go wrong. In 
particular we may not get disjoint pieces. Consider the relation | on integers 
for which a|b means ‘a divides b without remainder’. If we take ~ to be the 
relation | on {1, 2, 3, 4, 5, 6}, then 
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E = {1,2, 3,4,5,6} 


E = {2,4, 6} 
E; = {3,6} 
E, = {4} 

Es = {5} 

Es = {6}. 


So the set splits up according to: 


Fig. 4.12 What happens if the pieces overlap 


If instead we use the relation > on N, we do not even get x € Ey, so Ex is 
in no sense ‘the piece to which x belongs’. 

What is it that makes the original relation ~ work, whereas the others go 
wrong? We must take account of three very simple statements: 


(i) x belongs to the same piece as x; 
(ii) if x belongs to the same piece as y then y belongs to the same piece 
as x; 
(iii) if x belongs to the same piece as y and y belongs to the same piece as 
z, then x belongs to the same piece as z. 


Clearly any relation ~ with the property that x ~ y if and only if x and 
y belong to the same piece must have the three corresponding properties, 
which we formalise as (E1), (E2), (E3) of the next definition. 


Definition 4.7: A relation ~ ona set X is an equivalence relation if it has 
the following properties for x, y, z € X: 


(E1) x ~ x for all x € X (~ is reflexive), 


(E2) Ifx ~ y then y ~ x (~ is symmetric), 
(E3) Ifx ~ yand y ~ z then x ~ z (~ is transitive). 
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If we break X into disjoint pieces, the relation ‘is in the same piece as’ is an 
equivalence relation. We now show that every equivalence relation arises in 
this way from a suitable choice of pieces. In fact, there is an intimate connec- 
tion between the two concepts. First we need a formal definition of ‘breaking 
into disjoint pieces’. 


Definition 4.8: A partition of a set X is a set P whose members are non- 
empty subsets of X, subject to the conditions 


(P1) Each x € X belongs to some Y € P, 
(P2) IfX, Y € Pand X # Y, then XN Y= Ø. 


The elements of P are our ‘pieces’. Condition (P1) says that X is the union 
of all the pieces, so that each element of X lies in some piece; (P2) says that 
distinct pieces don’t overlap. It follows that no element of X can belong to 
two distinct pieces. 

Given an equivalence relation ~ on X we define the equivalence class (with 
respect to ~) of x € X to be the set 


Es = {y E X|x ~ y}. 


Theorem 4.9: Let ~ be an equivalence relation on a set X. Then 
{E,|x € X} is a partition of X. The relation ‘belongs to the same piece as’ 
is the same as ~. 

Conversely, if P is a partition of X, let ~ be defined by x ~ y if and only 
if x and y lie in the same piece. Then ~ is an equivalence relation, and the 
corresponding partition into equivalence classes is the same as P. 


PRE-PROOF REMARK: This theorem lets us pass at will from an equiva- 
lence relation to a partition or back again, by a procedure which, when done 
twice, leads back to where we started. 


Proof: Since x € Ex, condition (P1) is satisfied. To verify (P2), suppose that 
E; N E, # Ø. Then we can find z € E,NE,. Then x ~ zandy ~ z. 
By symmetry z ~ y, and then transitivity implies x ~ y. We show that 
this implies E, = E,. For if u € E, then x ~ u andy ~ x, soy ~ u 
hence Ey C Ey. Similarly E, C Ex. This shows that E, = E,. Thus we have 
proved that Ex N E, = Ø or E, = Ey. But this statement is logically equivalent 
to (P2). 
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Now define x ~ y to mean ‘x and y are in the same equivalence class’. Then 
xy ifand only ifx, y € E; for some z 
if and only ifz ~ x and z ~ y for some z 
if and only ifx ~ y. 
Hence © and ~ are the same. 
The second part of the theorem is proved in a similar manner, but is easier. 
We leave that to you. 


Example: Arithmetic Modulo n 


We use the equivalence relation concept to generalise the distinction between 
odd and even integers, and to set up what is often called (in schools) ‘modular 
arithmetic’ or (in universities) ‘the integers mod nr’. 
To begin with, we specialise to n = 3. Define the relation =; of congruence 
modulo 3 on Z by 
m =; nif and only if m - nis a multiple of 3. 


Proposition 4.10: =; is an equivalence relation on Z. 

Proof: 
(E1): m- m = 0 = 3-0. 
(E2): If m -n = 3k then n - m = 3(-k). 
(E3): lfm —n = 3k, n-p = 3l, then m -p = 3(k +1). 


We know that the equivalence classes (known as congruence classes mod 3) 
partition Z. What are they? It is easiest to see this with the help of examples. 


Eo = {y|0 =; y} 

= {y|y - 0 is a multiple of 3} 

= {y|y = 3k for some k € Z}. 
E = {y| 1 =; y} 

={y|y-1=3k} 

= {y|y = 3k + 1 for some k € Z}. 
E = {y |2 =; y} 

= {y|y = 3k + 2 for some k € Z}. 


E; = {y 


y = 3k + 3 for some k € Z}. 


However, 3k+3 = 3(k + 1), so E; = Eo. Similarly, Ey = E1, Es = Ey, Ey = Ep, 
E2 = E}, and so on. Every integer is either of the form 3k, 3k + 1, or 3k +2 
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(according as it leaves remainder 0, 1, or 2 on division by 3) so we get exactly 
three equivalence classes: 


Ey = {...,-9, -6, -3, 0, 3,6,9,...} 
E = {...,-8,-5, -2, 1,4, 7, 10, ...} 
By = {...,-7,-4, -1, 2, 5,8 11,...}. 


So much for the equivalence relation. More intriguing is the possibility of 
doing arithmetic with these equivalence classes. 

To make the notation more transparent in general, we denote the equiva- 
lence class of n by n3 instead of E,. In this notation, the three classes above 
become 03, 13, and 23. Let Z; = {03, 13, 23} and define operations of addition 
and multiplication on Z; by 


m3 + n, = (m+n)3, (4.1) 


m3n3 = (mn)3. (4.2) 


For example, 13 + 23 = 33 = 03; 2323 = 43 = 13. 

This may look pointless: such an impression is erroneous, as will soon be 
seen. It may also look harmless: certain subtleties must be noticed before 
worrying that something may go wrong, and a little hard thinking put in to 
see that, after all, it doesn’t. 

Here is the subtle problem: the same class has several different names; thus 
l; = 4, = 73 = ...,23 = 53 = 83 = .... For all we know at the moment, 
the definitions (4.1), (4.2) might give different answers to the same question, 
depending on which names we use. Thus we have seen that 13 + 23 = 03. But 
since 13 = 73, 23 = 83, we also have 13 + 23 = 73 + 83 = 153. By a stroke of 
good fortune, 153 = 03, and we can breathe again. 

What happens in general? If i; = i} then i-i’ = 3k for some k, and if j3 = j} 
then j - j' = 31 for some I. Now rule (4.1) gives two possible answers: 


i3 +j3 =(it+f)s, i3 +j} = (+j). 
However, 
i+j=i +3k+j +3l=(ť +7) +3(k+D, 
so 
(i +j) = (i +j’). 


Hence we get the same answer both ways, and (4.1) makes sense as a 
definition of addition. 
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Similarly we must check that the multiplication rule is unambiguous. With 
i, j, i’, j' as above, we have 


i3j3 = (ij)3, ijz = (ČJ). 
But 
ij = (i + 3K) + 32) = 77 + 314+ j'k + 3k), 
so 
(is = (J)a 


which is what we want. 

This problem always arises when we try to define operations on sets by a 
rule of the type ‘select elements from the sets, operate on these, then find the 
set to which the result belongs’. When, as here, the notation conceals such a 
process, we must be careful to think what the notation means rather than just 
manipulating symbols blindly. We must check that different choices give the 
same answer. 

It might appear that such checks can be dispensed with, on the grounds 
that everything nice will work. But consider defining powers in Z3. The 
natural way to do this is to mimic (4.1) and (4.2) to define 


mz = (m");. 


For example, 23° = (27)3 = 43 = 13. Using this ‘definition’ we can even prove 
theorems about the laws of exponentiation, for example 


m; P = (m"?); = (m"m?)3 = (m")3(m?)3 = my? mẹ. (4.3) 


However, we would be living in a fool’s paradise. For, since 23 = 53, rule (4.3) 
also tells us that 


23 = 2% = (27); = (32); = 23. 


Since 13 # 23 this shows that (4.3) is nonsense—but clever and plausible 
nonsense, the most dangerous kind. 
In common parlance, we must check that the operations are ‘well defined’. 
Really this is over-polite: what we are checking is that they are ‘defined’ at all! 
Having digressed at length, let us return to the arithmetic of Z3. We can 
write out addition and multiplication tables: 
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It can be verified that many of the usual laws of arithmetic hold (such as 
x+y =ytx, x(y +z) = xy + xz), although there are some surprises such as 


((13 + 13) $ l3 F 13) = 13. 


Instead of 3, we can use any integer n and do arithmetic modulo n. We define 
a relation =, on Z, by 


x =, y ifand only if x - y is a multiple of n.' 


We get n distinct equivalence classes 0,, lm 2m... (n - 1)n; while 
Ny = On (1 +1), = 1,, and so on; now x, consists of those integers that 
leave remainder x on division by n. The set Z, of equivalence classes admits 
operations of arithmetic defined in the same way as (4.1) and (4.2). 

We discuss these ideas further in chapter 10. 


Subtle Aspects of Equivalence Relations 


Although the definition of equivalence relation seems simple, and in our ex- 
perience virtually all students can write down the three properties, there are 
subtle aspects that are not apparent without deeper consideration. 

For instance, (E1) requires that x ~ x for all x € X. Some examples, such 
as lines being parallel in Euclidean geometry, seem to be equivalences, but 
they do not satisfy (E1) because technically a line cannot be parallel to itself. 
(Parallel lines have no point in common.) Parallel lines satisfy (E2), and if 
x, y, z are all different (E3) is satisfied. We could, if we wished, define the 
relation x ~ y for lines in the plane to mean “x is parallel to y or x = y’. In 
this case we would have an equivalence relation. 

In general, we must check the full meaning of a definition very carefully. 
The precise meaning of a definition is a recurring theme in the rest of this 
book. A definition means what it says, no more and no less. 


Non-Example 4.11: The following question was set in a first-year univer- 
sity examination: 

Given a set S with three distinct elements a, b, c, is the relation where only 
the following hold 


a~a,b~ba~b,b~a 


an equivalence relation? 


1 The standard notation is x = y (mod n). The symbol =3 is used here for consistency 
with our notation for a relation. 
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Think about it and write down your response before you read the next 
paragraph. 

You may very likely get the correct answer ‘no’. The reason is that we are 
not told that c ~ c, so (E1) is not satisfied. However, when this question 
was given to well-qualified students, many did not notice this. Instead, they 
focused on (E3), which says 


Ifx~ y andy ~ zthen x ~z, 


and many declared it to be false claiming that it requires three elements, x, y, 
z, while the set S has only two: a, b. However, in set theory different letters 
may represent the same element. So we could take x = a, y = b, z = b, or any 
other combination to show that, in every case, (E3) is true. 

It is also easy to come to the belief that the equivalence relations met in 
mathematics, such as the integers modulo n, have equivalence classes that are 
similar in some way. For instance the equivalence classes for the relation of 
integers modulo 3 are all infinite sets with a natural mapping between them. 
Examples like this may lead unconsciously to the expectation that equiva- 
lence classes are all like this in some way. In general, however, the partition 
theorem shows that equivalence relations break a set up into (non-empty) 
subsets of any size. The subsets chosen in a set do not need to have any spe- 
cial properties, other than every element in the set must lie in precisely one 
of the subsets in the partition. 

An interesting example is equivalence of infinite decimals. Two infinite 
decimals lie in the same equivalence class if they have the same value. As we 
have shown in chapter 1, each decimal expansion is either unique, or it is a 
finite decimal that can be written in exactly two ways: as an infinite number 
of nines or zeros (such as 3-47 = 3-46999 . . .). Here some equivalence classes 
contain one element, and all others contain two. 


Order Relations 


A second kind of relation, whose properties are quite different from those 
of an equivalence relation, arises when dealing with the order between 
numbers, as exemplified by the statements 4 < 5,7 > 27,x? > 0, 1-x? <1 
for any real number x. 

Fortunately, the various relations <, >, <, > are all connected with each 
other: 


x < y means the same as y > x, 

x < y means the same as y > x, 

x < y means the same as x < y or x = y, 
x < y means the same as x < y and x #y. 
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Therefore we need study only one of them and translate the results to the 
others. 

When handling numbers, it might be preferable to consider one of the 
strict relations < or >. In general we would not write 2 + 2 > 4, simply be- 
cause we know something more precise: 2+ 2 equals 4. Likewise, we normally 
write 2+2 > 3, because that contains more exact information than 2+2 > 3. 
But when passing to general statements, the situation changes. For instance, 
it is true that ifa, —> a,b, —> banda, > bn, then a > b, but a, > b, does 
not imply a > b. (A counterexample is given by a, = 1/n, b, = 0.) Here 
there is a slight preference towards the weak inequalities < and >. 

We begin with the latter. 


Definition 4.12: A relation R ona set A is a weak order if 


(WO1) aRbandbRcimplyaRc 
(WO2) either a R b or b Ra (or both) 
(WO3) aR band b Ra imply a = b. 


These properties evidently hold for both relations < and > on the set of 
real numbers, which may seem rather strange since one means ‘bigger’ and 
the other ‘smaller’. But looking at the real numbers as points on a line, we 
see that, by a reflection, we can turn the order round, interchanging left and 
right, and this interchanges < and >. It is only when we start doing arith- 
metic, and require a > 0, b > 0 to imply ab > 0, that we find a property 
of > that does not hold for <. We will postpone this consideration until we 
study arithmetic in chapter 9. 

Weak order relations naturally come in pairs. Given a weak order R, we 
can define R’, the reverse of R, by 


a R’b means b Ra. 
The reverse R’ is also a weak order relation, and reversing again we get 


R” =R. 


Example 4.13: IfA = {a, b,c}, where a, b,c are all distinct, then a weak 
order on A can be defined by aRb, aRc, bRc, aRa, bRb, cRc. We can 
visualise this by considering a, b, c in a row, with x Ry meaning x is to the 
left of y or x = y. 


abc 


oo0oọ 


The reverse of R simply puts the elements in order c, b, a. 
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Example 4.14: Define an order relation R on the plane by: 
(x1, y1) R(x, y2) means 


either yı = y2 and x; < x, (both together), 
ory) < y2. 


This at first looks bizarre, but in a picture we see that (x1, y1) R (x2, y2) means 
that either (x1, yı) and (x2, y2) are on the same horizontal line with (x1, y1) 
to the left of (or equal to) (x2, y2), or (x1, yı) is on a horizontal line strictly 
below the one through (x2, y2). 


a ee 6 EE E EENE 
(X,Y) 


(My) Oy) 
----J----- @-------------- Or 


(x.y) 


Fig. 4.13 An order on the plane 


Example 4.15: A = {{0}, {0, 1}, {0, 1, 2}, {0, 1, 2, 3}}, 
xRy means x Cy for x, y € A. 


Here {0} C {0, 1} C {0,1,2} C {0, 1,2, 3}. 


Inclusion of sets satisfies 


XCY and YCZ impliesX CZ, 
XCY and YCX implies X = Y, 


but for arbitrary sets X, Y we may have X Z Y and Y ¢ X. This means that 
set inclusion in general satisfies (W01), (W03), but not (WO2). A relation R 
on a set A satisfying (WO1) and (WO3) is said to be a partial order, and A a 
partially ordered set, or, with some loss in dignity, a poset. Given any set A of 
sets, then inclusion always yields a partial order. 

Let R be a weak order on a set A; then the corresponding strict order S is 
given by: 


xSy means precisely xRy and xy. 


For example if R is <, then S is <. 


90 | 4 RELATIONS 


Proposition 4.16: A strict order S on a set A satisfies: 


(SO1) aS band b Sc imply aSc 
(SO2) Given a, b € A, then precisely one of the following hold (and not 
the other two): 


aSb, bSa, a=b. 


Proof: Suppose that a S b and b Sc. Then a Rb and b Rc, and (WO1) implies 
that a Rc. We cannot have a = c, for substituting in b R c we get b R a, and by 
(WO3) this and a Rb gives a = b, contradicting a S b. This verifies (SO1). By 
(WO2) a, b € A implies aR b or b Ra, so a Sb or a = b or b Sa. But no two 
of these can hold simultaneously because a = b contradicts the definitions of 
both a Sb and b Sa, and were a Sb, b Sa to hold simultaneously, then a R b, 
b Ra would hold, so (WO3) gives a = b, contradicting a S b once more. This 
verifies (SO2). 


Condition (SO2) in proposition 4.16 is usually referred to as the trichotomy 
law. (Just as a dichotomy is two mutually exclusive possibilities, a trichotomy 
is three, in this case a Sb, b Sa, or a = b.) For the strict order < on the real 
numbers the three mutually exclusive possibilities are a < b, b < a, a = b. 

We remarked earlier that we could return to the weak order < from < 
through the connection 


a < b means precisely a < b or a = b. 


The same happens for any strict order. Given a relation S on a set A satisfying 
(SO1) and (SO2), define 


aRb to mean aSb or a=b. 


It is easy to verify that R satisfies (W01)-(W03), and that we can pass freely 
from a weak order to the corresponding strict order and back again. In this 
manner the notions of weak and strict order are interchangeable. Although 
we have taken (WO1)-(WO3) as the basic axioms and proved the proper- 
ties (SO1), (SO2), we could just as easily reverse their status by taking (SO1), 
(SO2) as basic axioms and deducing (WO1)-(WO3). 


Exercises 
1. Write out the proofs of propositions 4.4(b), 4.4(c), and 4.4(d). 
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. Prove that 


(US) x (UT) C U{X x Y|X eS, YeT} 
(NS) x (AT) = {Xx Y|X eS, YET} 


for all sets S, T of sets. Show that in the first formula we cannot replace 
‘by =. 

. IfA = Ø, show that A x B = Ø = B x A for every set B. When A # Ø, 
show that A x B = A x C implies B = C. Given A x B = B x A, what 
can be deduced about A and B? 


. Let A = N x N. Define the relation R on A by 
(m,n) R(r,s) means m+s=r+n. 


Show that R is an equivalence relation. 
If B= {(x,y) € Z x Z| y # 0}, and S is the relation on B given by 


(a, b)S(c, d) if and only ifad = be, 


is S an equivalence relation? Prove your assertion. 
5. How many distinct equivalence relations exist on {1, 2, 3, 4}? 


. Recall the properties (E1), (E2), (E3) of an equivalence relation. Which 
of these properties is satisfied by the relation between x, y € R given 
by: 

(a) x<y 

(b) x>y 

(c) |x-y| <1 

(d) |x-y| <0 

(e) x- y is rational 
(f) x-y is irrational 
(g) (x-y) <0. 

. Is there a mistake in the following proof that (E2) and (E3) imply (E1)? 
If so, what is it? 


Leta ~ b. 
By (E2), b ~ a. By (E3), ifa ~ band b ~ a, thena ~ a. 
This proves (F1). 


. Give examples of relations (the more elegant, the better) satisfying 
(a) none of the properties (F1), (E2), (E3) 
(b) (E1) but not (E2) or (E3) 
(c) (E2) but not (E1) or (E3) 
(d) (E3) but not (E1) or (E2) 
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10. 


11. 


12. 


(e) (E2) and (E3) but not (E1) 
(£) (E1) and (E3) but not (E2) 
(g) (E1) and (E2) but not (E3). 


Write out addition and multiplication tables for the integers mod 4, 
mod 5, and mod 6. 
Find all a, b € Zy such that ab = 012. 


Define a relation R on N by 
a Rb means a divides b, 


that is, b = ac for some c € N. Is R an order relation? If so, is it a weak 
order or a strict order? 


Let X = {1, 2, 6, 30, 210} and define a relation S on X by 
aSb means a divides b. 


Is S an order relation? If so, is it a weak order or a strict order? 


Let A be a set with a (strict) order relation S and B a set with (strict) 
order relation T. Define the lexicographic relation L on A x B by 


(a, b) L(c, d) means: either aSc, or a=candbTd. 


Is this an order relation? What is the connection between this and a 
dictionary? 
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CHAPTER 5 


Functions 


ern mathematics, at all levels. The concept first became prominent in 

calculus, which is about functions: how to differentiate them or inte- 
grate them. Early attempts to develop a general concept of functions were 
somewhat confused and unsatisfactory, largely because they tried to do too 
much at once. The function concept as it is now understood evolved grad- 
ually from these attempts: it has great generality and great simplicity. In 
fact, the current concept is so general that when doing calculus, extra condi- 
tions must be imposed to restrict the class of functions under consideration 
to those that can be differentiated or integrated. Thus the desired object is 
achieved by taking a very general definition of ‘function’ and then selecting 
more special types of function by imposing extra conditions. 

In this chapter we develop the general concept of a function gradually, 
starting from familiar examples and extracting general principles. We discuss 
some general properties that functions can have. We introduce the graph 
of a function, and relate it both to the formal definition and the traditional 
picture. 


F unctions are of enormous importance throughout the whole of mod- 


Some Traditional Functions 


Traditionally, a function is defined by introducing a ‘variable’ x, usually sup- 
posed to be a real number, and talking of a ‘function f(x) of x. The most 
significant feature of such a definition is that in principle we should be able 
to work out the value of f(x) for any given x (possibly under restrictions such 
as x #0, x > 0, depending on the function involved). Here are some familiar 
examples: 


e The exponential function takes value e” for any real number x (where 
e = 2-71828...). 
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e The sine, cosine, and tangent functions take values sin(x), cos(x), tan(x) 
for all real x, except that for the tangent we have to assume x is not an odd 
integer multiple of 7/2 in order for the definition tan(x) = sin(x)/ cos(x) to 
make sense. (Often the parentheses are omitted and we write sin x, cos x, 
and so on.) 

e The logarithmic function takes the value log(x) when x is real and x > 0. 
(The notation log x is also common.) 

e The reciprocal function takes value 1/x for real x 7 0. 

e The square function takes value x? for any real x. 

e The factorial function x! is defined only for x a positive integer. 


What do all of these examples have in common? It is our ability, in principle, 
to calculate the value of the function corresponding to the relevant values 
of x. In other words, a function associates to each relevant real number x a 
value f(x) which is also a real number. In the above examples, respectively, 
f(x) = e*, sin(x), cos(x), tan(x), log(x), 1/x, x”, x!. 

We should not confuse the values of the function with the function itself. 
It is not log(x) that is the function: it is the rule ‘take the logarithm of’, which 
allows us to work out the value. In a sense, the function itself is the symbol 
‘log’. So we think of a function f as some ‘rule’ which, for any real number x 
(perhaps subject to restrictions), defines another real number f(x). The def- 
inition of f(x) should be unique; a rule that gives two different answers to 
the same question is not especially useful. This means we must be careful 
with such functions as ‘square root’, specifying whether we mean the posi- 
tive square root or the negative one. Don’t worry about this now; we'll return 
to it later when the basic ideas are established. 


The General Function Concept 


The most general definition of a function comes from the traditional one by 
relaxing the requirement that x and f(x) should be real numbers. In fact, even 
in traditional mathematics, complex numbers are permitted; indeed, a wide 
variety of non-numerical things as well. For example, the area of a triangle is 
a function defined on triangles. The easiest and most satisfactory assumption 
is not to impose restrictions of any kind on the nature of x or f(x). How- 
ever, we must then be more precise about what we mean by a rule, because 
traditional formulas are too limited in scope. 

In our examples of functions, x ranges over some set of possible choices, 
and so does f(x). The natural choices for these sets are often different; for 
example the logarithmic function requires x > 0, whereas log(x) can be any 
real number. 
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We therefore begin with two arbitrary sets A and B. As a preliminary 
definition we formulate: 


Preliminary Definition 5.1: A function f from A to B is a rule that assigns 
to each a € A a unique element f(a) € B. 


This definition is very broad. It includes all of the previous examples: take A 
to be a suitable subset of R and B to be R. Thus: 


For the exponential: A = R, B = R, and f(x) = e”. 

For the logarithm: A = {x € R|x > 0}, B = R, and f(x) = log(x). 
For the reciprocal: A = {x € R | x # 0}, B = R, and f (x) = 1/x. 
For the factorial: A = {x € R |x > 0}, B = N, and f(x) = x!. 


Examples of rather different types of function that this definition allows 
include: 

A = {all circles in the plane}, B = R, f(x) = the radius of x. 

A = {all circles in the plane}, B = R, f(x) = the area of x. 

A= {all subsets of {0, 2, 4}}, B= N, f(x) = the smallest element of x. 

A= {all subsets of {0, 1, 2, 3, 4, 5, 6, 7}}, B = {0, 1, 2, 3, 4, 5, 6, 7, 8}, 
f(x) = the number of elements of x. 

A=N, B= {0, 1, 2}, f(x) = the remainder on dividing x by 3. 

A= {camel, lion, elephant}, B = {January, March}, 
f (camel) = March, f(lion) = January, f (elephant) = March.! 


Definition 5.2: We call A the domain of f and B the codomain. We write 
f:A7>B 


to mean ‘f is a function with domain A and codomain B’. 


The main item still on the agenda is that troublesome word ‘rule’. We 
obtain a formal definition in exactly the same way that we obtained one 
for ‘relation’ in chapter 4, by judicious use of ordered pairs. We want to 
associate to each x €A an element f(x) € B. One way to do this is to stick 
them together in an ordered pair (x, f(x)). The ‘rule’ is then the entire 


l This is of course a pretty silly function, with no mathematical importance. It illus- 
trates that quite arbitrary definitions of f(x) may be made. Actually, this one isn’t quite so 
arbitrary as it may seem. A certain zoo has three animal-houses: the camel-house, the lion- 
house, and the elephant-house. Once a year the houses are redecorated: the lion-house in 
January, the others in March. Now f(x) = the month in which the x-house is redecorated. 
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set of ordered pairs (x, f(x)) as x runs through A, and this is of course a 
subset of the Cartesian product A x B. 

In the previous chapter, we defined a subset of A x B to be a relation from 
A to B. This means that a function can be viewed a special kind of relation: it 
relates x to f(x). 

The requirement that f(x) is defined for every x € A translates as the 
requirement that for any x € A there is some element (x, y) in the set. 
Uniqueness of f(x) translates as the requirement that for each x, the cor- 
responding element y should be unique. Now we see how a set of pairs can 
capture a rule: to find f(x), look in the set for a pair (x, y). This exists and is 
unique, so we put f(x) = y. 

Formally: 


Definition 5.3: Let A and B be sets. A function f:A — B is subset f of 
A x Bsuch that 


(F1) If x € A there exists y € B such that (x, y) € f. 
(F2) Such an element y is unique: in other words, if x € A and y, z € Bare 
such that (x, y) € f and (x, z) € f, it follows that y = z. 


A function is also called a map or mapping. 
In terms of this definition, the ‘square’ function with domain R is the 
subset 
{(x, x?) |x €R} 
of R x R. The curious function above is the set 
{(camel, March), (lion, January), (elephant, March)}. 


We recover the usual notation by defining f(x) to be the unique element 
y € B such that (x, y) € f. 

The definition of a function in terms of a set of ordered pairs is formally 
very neat, because it states everything in terms of sets. But it is pedantic and 
pointless to use ordered pairs when we wish to define a specific function. 
Instead, we use a form of words along the following lines: 


‘Define a function f : A —> B by f(x) = whatever for all x € A? 
An alternative notation, often employed, is 
‘Define a function f : A —> B, x > whatever.’ 


In any particular case, ‘whatever’ is replaced by a specific prescription to find 
f(x) given x. These statements are interpreted formally as: 


‘f is the subset of A x B consisting of (x, f (x)) € A? 
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Then we must check that the prescription defines f(x) uniquely, and that 
f(x) € B, forall x € A. 


Example 5.4: Define the function f : N > Q by 
f(n) = V2 to n decimal places. 


Then 
fQ) = 1-4, 
f(2) = 1-41, 
f(3) = 1-414, 


and so on. This rule defines f(n) uniquely because V2 is irrational, so it is 
not one of the decimals that can either end in lots of 0s or lots of 9s. Also, 
f(n) is always rational since it is a terminating decimal. 

The formal function is the subset of N x Q comprising all ordered pairs 


(n, 4/2 to n decimal places), 
namely 
{(1, 1-4), (2, 1-41), (3, 1-414), ...}. 


The advantage of the informal usage is manifest. But knowing how to 
translate it into formal terms means that the informality is safe. 


Non-Examples 5.5: Here are a number of statements that look as if they 
define functions, but on closer inspection fail in some respect. 
2 
ii impo Rees 
This does not define a function since when x = -1,1/(x + 1) is not 
defined, so f(-1) has not been specified as a real number. If we change 
the definition to start f : R\{-1}...’ then we're all right. 
(2) Define f : Q— Q by f(x) = ./x (positive square root). 
This does not define a function because for some x, for instance x = 2, 
the value f(x) = ./2 does not belong to Q. If we change the second Q 
to an R then all will be well. 
(3) Define f : R —> Q by f(x) = the rational number nearest x. 
This does not define a function: the supposed f (x) does not exist. 
(4) Define f : R > N by f(x) = the integer number nearest x. 
This almost works: the trouble is that both 0 and 1 are equidistant 
from 4, so f (4) is not defined uniquely. 
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General Properties of Functions 
Next we introduce an important subset associated with a function. 


Definition 5.6: Iff : A — Bisa function, then the image of f is the subset 
F(A) = {f() |x € A} 


of B. Another common notation is im(f ). 


The image of f is the set of values obtained by working out f(x) for all x in 
the domain. It need not be the whole codomain; for example if f : R > R 
has f(x) = x? then the image is the set of positive reals, and is not the whole 
codomain R. 

The lack of symmetry in the definition of a function may seem disturbing. 
We require f(x) to be defined for all x € A, yet we do not require every b € B 
to be of the form f(x). The reason is pragmatic. When we use a function, 
we want to be sure that it is defined, so knowledge of the precise domain is 
essential. However, it is less crucial to know exactly where the values f(x) lie, 
so we can choose the codomain to be whichever set is convenient. 

For instance, if we define 


f:NOR 
by 
f(n) = Vn! 
then the image of f is the set of cube roots of factorials 


{1, 1/2, 16, s/24, V120, . . .} 


which is not a very nice set. Images in general can be pretty revolting. So it 
makes sense to define a function in terms of a codomain, and to leave aside 
the calculation of exactly which part of the codomain we really require, in 
the hope (often fulfilled) that it is not needed. If it is, we can work it out. 

This brings us to another minor point. Strictly speaking, we cannot talk of 
‘the’ codomain of a function. Consider 


f:ROR f(a’, 
g: ROR g(x) =x’, 


where R* = {x € R|x > 0}. The first has codomain R and the second R+, 
yet the formal definition of a function as a set of ordered pairs leads to 
the same set {(x,x?) | x € R} in both cases. The functions f and g are equal. 


5 FUNCTIONS | 99 


So ‘the’ codomain of a function is ambiguous. Any set that includes the range 
of the function will do as a codomain. The domain, on the other hand, is 
unique. 

We can remove this ambiguity by being more pedantic, and defining a 
function to be a triple ( f, A, B) rather than just a set of ordered pairs f. But at 
this stage, the definition as a triple is off-putting and not worth the effort, and 
in any case the notation f : A —> B tells us which of the possible codomains 
is intended in any particular instance. 

A familiar way to picture a function f : R — R is to draw its graph; we 
say more about this topic in the next section. For sets other than R it is often 
better to think of a function in terms of a picture like this: 


Fig. 5.1 A picture of a function 


For each x€A the value of f(x) is to be found at the far end of the 
corresponding arrow. 
The definition of a function, expressed in pictorial terms, becomes: 


(F1’) Every element of A is at the tail end of a unique arrow. 
(F2’) All the arrowheads end up in B. 


This type of diagram is a pictorial device, on a par with Venn diagrams, 
but it is useful as a source of motivation and simple examples. 

Using such a picture, the range of f is the set of elements of B that lie on 
the sharp ends of the arrows: 


domain of f codomain of f 


p image of f 


v 


Fig. 5.2 Domain, codomain, and image 


The range of f is the whole codomain B if every element of B is at the end 
of some arrow. This motivates a more formal definition: 
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Definition 5.7: A function f : A — B is a surjection (to B) or is onto B if 
each element of B is of the form f(x) for some x € A. 


Whether a function is a surjection depends on the choice of codomain. 
So the statement ‘f is a surjection’ can be made only if it is clear from the 
context which codomain is intended—as will be the case in a phrase such 
as f : A— B is a surjection’, where the codomain is B. The next examples 
clarify this. 


Examples 5.8: 


(1) f : R > R where f(x) = x’. This is not a surjection to R, since no 
negative real number is the square of a real number; in particular, 
-1 € R but is not of the form x? for any x € R. 

(2) f : R > R* where f(x) = x°. This is a surjection to R+, since every 
positive real number has a square root, which is real. 

(3) f : A — B where A = {fall circles in the plane}, B = R+, and 
f(x) = the radius of x. This is a surjection to R*, since given any 
positive real number we can find a circle with that number as radius. 


If no element of B lies at the end of two different arrows, we have another 
important type of function: 


Definition 5.9: A function f : A — B is an injection, one-one, or one-to- 
one, if for all x, y € A, f(x) = f(y) implies x = y. 


This time the precise choice of codomain does not lead to any problems. 
If f is an injection for one choice of codomain, it is also an injection for any 
other choice. Here are some examples. 


Examples 5.10: 


(1) f : R > R where f(x) = x’. This is not an injection, since f(1) = f(-1) 
but 1 4-1. 

(2) f : Rt —> R where f(x) = x°. This is an injection: if x and y are 
positive real numbers and x? = y’, then 0 = x? - y? = (x-y)(x+y). 
Therefore either x - y = 0 and x = y, or x + y = 0 which is impossible 
with both x and y positive unless x = y = 0. Either way, x = y. 

(3) f : R\{O} —> R where f(x) = 1/x. This is an injection, since if 1/x = 1/y 
then x = y. 


Nicest of all are functions with both of these properties: 
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Definition 5.11: A function f : A — Bisa bijection if it is both an injection 
and a surjection (to B). 


Again, this property depends on the choice of codomain. Another com- 
mon term for the same property is one-to-one correspondence. However, it is 
easy to confuse this with ‘one-to-one’, so we shall avoid them both. Instead 
of saying that f is a bijection (injection, surjection), we often say that f is bi- 
jective (injective, surjective). Clearly f : A — B is a bijection if and only if 
every b € Bis of the form b = f(x) for a unique x € A. 

All combinations of injectivity and surjectivity can occur, as the following 
pictures illustrate: 


neither surjective 
nor injective 

not surjective 
injective 


surjective 
> — 
surjective ses 
—»>— ARE bijective 
and injective 


Fig. 5.3 Various kinds of function 


One very important, though trivial, function can be defined on each set A. 


Definition 5.12: The identity function i, : A — A is defined by i4 (a) = a 
foralla € A. 


This is obviously a bijection. 


The Graph of a Function 


There are two competing ways to picture a real function, by which we mean a 
function whose domain and range are subsets of R: the graph, and the blobs- 
and-arrows diagram. There are interesting connections between the two. A 
blobs-and-arrows picture of the function f(x) = x (x € R) looks something 
like this: 
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-5 -5 
R R 
Fig. 5.4 An arrow diagram from R to R 


We can disentangle the arrows better if we place A horizontally and let it 
overlap B at 0: 


Fig. 5.5 Arrow diagram from horizontal to vertical 


However, it is more interesting to use arrows that run only vertically or 
horizontally: 
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Fig. 5.6 Same diagram with vertical and horizontal arrows 


This makes it clear that the important thing is where the corner occurs. If 


we vary x, all the corners lie on a curve: 


Fig. 5.7 The corners of the arrows are on a curve 


, and what we get is the usual graph: 


Now we can eliminate the arrows 
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Fig. 5.8 Eliminate the arrows to get the graph 


Conversely, given this graph, we can put the arrows back. Starting at x € A 
we move vertically until we hit the graph, then horizontally until we hit B. 
This point will be f(x). 


Fig. 5.9 Recovering the arrows from the graph 


What is the graph set-theoretically? The plane is R x R, and the corner in 
the arrow from x to f(x) occurs at (x, f(x)). So the graph of f is the set 


{ (x f(x)) | x € R}. 
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But this, in formal terms, is the same as f. By drawing R x Ras a plane, we 
are led to the graph as the natural picture for f. 

For a general function f : A —> B we need a corresponding picture. Now 
we have a way to draw A x B, and we use this instead of the plane. Thus the 
camel-lion-elephant function we met earlier has the ‘graph’: 


March e 
January e e 
camel lion elephant 


Fig. 5.10 Picturing a more general situation 


By analogy with the previous function, suppose that we draw vertical ar- 
rows from elements of A, until we hit the graph, and then horizontal arrows 
until we hit B. The result is: 


March zz 
i 
January ---@-- e 
camel lion elephant 
Fig. 5.11 Recovering the arrows 


A little distortion recovers the blobs-and-arrows picture: 


camel lion elephant 


Fig. 5.12 The blobs-and-arrows picture 
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Strictly speaking, the graph picture for a function f : R —> R should 
look like: 


A=R 
Fig. 5.13 The picture from R to R 


The traditional picture, with A and B drawn on the plane as ‘axes’, is more 
familiar and convenient: 


Fig. 5.14 Drawing the axes on the picture 


But it should be remembered that these ‘axes’ are not part of the graph, 
but act as labels for the points (x, y) of the plane. 


Composition of Functions 
Iff : A — Bandg: C — Dare two functions, and the image of f is a subset 


of C, then we can compose f and g by ‘first doing f, then g. 


| 
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Fig. 5.15 The composition of functions 


In formal terms, we define: 


Definition 5.13: Iff:A— Band g : C —> Dare functions and f(A) C C, 
the composition g o f is the function 


gof:A->D 
for which 


gof(x) = g(f(x)). 


Of course we must verify that g o f is a function from A to D, but this is easy. 

It is a pity that gof corresponds to ‘first f, then g’, since a more natural 
notation would seem to be fog. But the latter would make the definition 
read f o g(x) = g(f(x)), which looks wrong. One way out is to write (x)f in- 
stead of f(x) and let composition be given by (x)f o g = ((x)f )g. But this looks 
odd too! 

Composition of functions has a very useful property: it is associative, in 
the following sense: 


Proposition 5.14: Let f :A— B,g:C— D,h: E — F be functions 
such that the image of f is a subset of C and the image of g is a subset of E. 
Then the two functions 


ho(gof): A> F 
(hog)of: A>F 


are equal. 


Proof: By ‘equal’ here we mean that the two subsets of A x F that define the 
functions are equal; this in turn means that for each x € A the two functions 
take the same value. Now 
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ho(gof)(x) = hlg of(x)) = A(g(f(x)))s 
(hog) o f(x) = ho g(f(x)) = h(g(f())), 


which proves the theorem. 


Identity functions have nice properties under composition: 


Proposition 5.15: Iff:A — Bisa function, then 


fois=f, igof =f. 


Proof: This is a routine verification from the definitions. 


Inverse Functions 


We think of a function f : A —> Basa rule that takes x € A and does some- 
thing to it; namely, the rule produces f(x) € B. Sometimes we can find a 
function g that ‘undoes’ f. We call g an inverse function to f. However, some 
care is needed because inverse functions need not exist, and the order in 
which we perform f and g sometimes matters. 


Definition 5.16: Let f:A — B be a function. Then a function g: B > A 
is called a 


left inverse for f if g(f(x)) = x for all x € A, 
right inverse for f if f ( g(y)) = y for all y € B, 
inverse for f if it is both a left and a right inverse for f. 
The three conditions may be stated in equivalent terms: 
gof = is 
fog= ig, 
gof =i,sandf og = ig 


Here are some illustrations, using single arrows > for f and double arrows 


Fig. 5.16 Left inverse 


>> for g. 
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Fig. 5.17 Right inverse 


Fig. 5.18 Inverse 


These pictures suggest a useful criterion: 


Theorem 5.17: A function f : A —> B has a: 


(b) right inverse if and only if it is surjective 
(c) inverse if and only if it is bijective. 


Proof: (a) Suppose f has a left inverse g. To prove f injective, suppose that 
f(x) = f(y). Then x = g(f(x)) = g(f(y)) = y, so f is an injection. Conversely, 
suppose that f is injective. If y € B and y = f(x), define g(y) = x. By injectiv- 
ity, this x is unique. If y is not an element of the range of f, no such x exists; 
we then pick any a € A and define g(y) = a. Now g(y) is defined for all y € B 
and g : B —> A is a function. But g(f(x)) = x by the definition of g, so g is a 
left inverse. 

(b) Suppose that f has a right inverse g. If y € B then y = f(g(y)), so is of 
the form f(x) for x = g(y). Hence f is a surjection to B. Conversely, suppose 
that f is surjective. Let y € B. Then y = f(x) for some x € A, not necessarily 
unique. For each y € B define g(y) to be one particular choice of an element 
in A for which f(g(y)) = y. Then g is a function, and a right inverse to f. 

(c) The function f has an inverse if and only if it has a left inverse g that is 
also a right inverse. Therefore f is both injective and surjective, hence bijec- 
tive. If f is bijective, it has a left inverse g, and it is easy to verify that this g is 
also a right inverse. Hence f has an inverse. 


Example 5.18: 


(1) f : R > R, f(x) = x. This is bijective and has inverse g : R > R, 
g(x) = 43x. 
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(2) 


(3) 


(4) 


(5) 


f: R= R, f(x) =x’. As it stands this is neither injective nor surjec- 
tive, so it has neither kind of inverse. So what has happened to square 
roots? We can make f surjective by taking R* = {x € R| x > o} as 
codomain. Now f : R — Rt is surjective, and g(x) = ./x (positive 
square root) is a right inverse since f(g(x)) = (./x)? = x. But it is not 
a left inverse, since 


JL =x ifx>0, 
v2 =-x ifx <0. 


We assume for this example properties of exponentials and loga- 
rithms that we have not proved rigorously in this book. Let f : R —> R 
be given by f(x) = e”. Then f is injective, for if e* = e” then e*” = 1 
so x- y = 0 so x = y. Moreover, the function f has a right inverse, 
defined by (say) 

gly) = log y(y > 0) 


gly) = 273(y < 0). 


For g(f(x)) is calculated as follows: f(x) = e* which is positive, so 
g(f(x)) = g(e*) = log e* = x. The arbitrary 273 does not enter into 
this calculation—it is there merely to define g on the whole of R. Any 
other definition would do for negative real numbers, because of the 
way the calculation works. 

More sensibly, consider f : R > R* where R” = {x € R | x > 0} and 
f(z) = e*. Now f is a bijection, and g : R* > R, with g(y) = log y, is 
an inverse: 


els y =y, 
log e* = x. 


In this example we assume properties of trigonometric functions. 
Consider f : R > R, f(x) = sin x. This is neither injective nor sur- 
jective, so it has neither kind of inverse. But what about sin”! x (or 
arcsin x), as found in trigonometric tables? 

The answer depends on exactly what we are trying to achieve. 
If sin"'(x) is defined to be the unique y with -7/2 < y < 7/2 
such that sin y = x, then this is a right (but not left) inverse to 
f:R— {x €R|-1 <x < 1}. However, it is not a left inverse; for 
instance 


sin™! (sin 6) = sin” 0 = 0 4 6r. 
Sometimes it is said that sin! is ‘multivalued’. According to our 


definition, it cannot then be a function in the legal sense of the term. 
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(6) The most satisfactory procedure is as follows. Let 
f:{xER]|- 7/2 <x < 7/2} > {x ER|-1<x<]} 


be defined by f(x) = sin x. Then f is a bijection, and sin“! is an inverse 
function for this f. 


Left and right inverses need not be unique—one reason why their con- 
struction involves arbitrary choices. But inverses are unique. 


Proposition 5.19: Ifa function has both a left inverse and a right inverse, 
then it has an inverse. This inverse function is unique, and every left or right 
inverse is equal to it. 

Proof: If f: A — B have both a left and a right inverse. Then by theorem 
5.17,f is a bijection, and has an inverse F. If g is any left inverse, then 


g=goip=go(foF)=(gof)oF=i,0F=F. 


Similarly, if h is any right inverse then h =F. Since an inverse is in particular 
a left inverse, this also proves F unique. 


The notation for an inverse function to f : A —> B, provided it exists, which 
occurs precisely when f is a bijection, is 


fl :BO A. 


WARNING. Don’t confuse f~!(x) with the reciprocal 1/f (x). (For example, if 
f(x) = x’, then f'(x) = ./x and 1/f(x) = 1/x*.) 


Proposition 5.20: If f:A— Band g: B — Care bijections, then g o f : 
A — Cisa bijection, and 


gop? =f og". 
Proof: It is clear that g o f is a bijection. It is also easy to verify directly that 
f= og’! isa left inverse, since 
(flog )o(gof)=f" olg" o(gof)) 
=f'ollg ogof) 
=f" o(igof) 
=f'of 


= 1A. 


Hence, by theorem 5.17, it is an inverse. 
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This calculation is illustrated below: it is much less horrendous than it may 
appear. 


frog 


Fig. 5.19 Composing inverse functions 


Restriction 


Definition 5.21: Iff :A — B is a function and X C A, the restriction of f 
to X is the function 


for which 


f |x) = f(x) (for x € X). 


This function differs from f only in that we forget about those x that do not 
lie in X. 

For example, if f : R > R has f(x) = sin x, and X = {x € R| 0 < x < 6r}, 
then the graphs of f and f|x are like this: 


domain of f 


domain of f|x 


Fig. 5.20 The restriction of a function 
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Restriction is a relatively trivial operation. Its main use is to concentrate at- 
tention on how f behaves on the subset X. Sometimes this is useful: we noted 
above that sin : R — I is not a bijection when I = {x € R|-1 < x < 1}. If 


T={xeR|-2/2 <x < 2/2}, 


however, then sin |x : X — I is a bijection. 


Definition 5.22: Restricting the identity function i4 : A —> A to a subset 
X C A gives the inclusion function 


inlx : XA 


for which ig |x(x) = x(x € X). 


This is therefore the same function as ix, but with a different choice of 
codomain which leads to a different emphasis: it provides a formal statement 
of how X sits inside A. 


Sequences and n-tuples 


We can now use functions to tidy up some questions that arose earlier. In 
particular we can give precise definitions of sequences and n-tuples. Earlier 
we gave definitions of ordered pairs, triples, quadruples, and so on, but no 
general prescription. 


Definition 5.23: Let X, be the set {1,2,3,...,n} = {x € N|1 <x <n}. If 
S is a set then an n-tuple of elements of S is defined to be a function 


f:Xn > s. 


This function specifies elements f (1), f(2), ..., f(n) of S. If we change no- 
tation to (fi, fa...» fa) we see that two n-tuples (fi, ..., fn) and (g1,...5 gn) 
are equal if and only if fi = g1, f2 = 22, -- -> fn = gn. This is what an n-tuple 
should look like. 

Similarly, a sequence a), a2, . . . , described earlier as an ‘endless list’, may 
be rigorously defined as a function 


f:N-S 


where now we think of f (n) as ay. 

In the case of ordered pairs, the new definition of ( fi, f2) turns out not to 
be the same as the one given by Kuratowski, chapter 4. However, it has the 
same key property for ordered pairs: (fi, fo) = (g1, g2) if and only if fi = gı 
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and f = go. This is really all we need, so once again we see that if we focus on 
the property in a formal definition, rather than the object defined, we get the 
same fundamental mathematical idea. It is another example of a crystalline 
concept. 


Functions of Several Variables 


In calculus we encounter ‘functions of two variables’, such as 


f(xy) = x? — 3y? + cos xy (xy E€ R). 


It is not necessary to go through the whole rigmarole again to make these 
functions precise. The notation makes it clear that f is just an ordinary 
function defined on the set of ordered pairs (x, y), that is, 


f:RxROR.. 


In general, if A and B are sets then a function of two variables a € A, b € Bis 
a function f : A x B —> C. Similarly, functions of n variables are just functions 
defined on a set of n-tuples. 


Binary Operations 


In many areas of mathematics, it is common to combine two numbers to- 
gether to get another number, or two objects of a given kind to get another 
object of the same kind. This leads to the concept of a binary operation on a 
set A, which can be formally defined as: 


Definition 5.24: A binary operation ona set A is a function f : Ax A > A. 


Examples 5.25: Familiar examples include: 


(1) Addition on N : « : N x N > N, a(x y)=x+y. 
(2) Multiplication on N : u : N x N —> N, u(x, y) = xy. 
(3) Subtraction on N : o : N x N > N, o(x,y) =x- y. 
(4) Division on the non-zero elements of Q. Here we let 
Q* = {x € Q|x 40} and define 5 : Q* x Q* > Q* by d(x, y) = x/y. 
(5) Union of sets. Let A = P(X), the set of all subsets of a given set X. 
Defineu:AxA—>A by u(Y,Y2)=Y¥y UY. 
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(6) Composition of functions. For any set X, let M be the set of all 
functions from X to X, so that f € M means f : X — X. Define 


c:M xM — Mbyc(g,f)=go0f. 


Most occurrences of binary operations in mathematics do not use function 
notation f(x, y). Instead, they appear in the form x * y for some symbol «. In 
the above examples we have x + y, xy, x-y,x/y, x Uy, xo y. For this reason we 
usually denote a binary operation by x: A x A — A and the image *(x, y) by 
xy. After examples (2) and (6), x * y is called the product, or composite, of x 
and y. In example (2), there is no intervening symbol at all. This economical 
notation is used often in other mathematical situations where there is no 
danger of confusion. For instance the composite of functions in (6) is usually 
written as gf instead of g o f. You have already become accustomed to such 
conventions when you learned to read 


27 as 2 times 7, 
1 1 
2; as 2 plus 5, 


21 as 2 times ten plus 1. 


Using x xy notation, we do not usually expect x xy and y * x to be the same. 
For example, if x is subtraction, then 2 - 1 # 1 - 2. However, when they are 
the same, the algebra of x can be simplified, so we are led to: 


Definition 5.26: Ifx* y = y * x for all x,y € A, then * is commutative. 


In examples (1), (2), and (5), the binary operation is commutative; in 
examples (3) and (4), it is not. The operation in example (6) is non- 
commutative if X has more than one element. In fact, if a, b € X, a # b, 
we define f(a) = f(b) = a, g(a) = g(b) = b, and f(x) = g(x) = x otherwise. 
Then g x f(a) = b, but f x g(a) =a,sog*f #f * g. 

Unless we know that a binary operation is commutative, it is essential to 
maintain the order of elements in a product. Such a product can be extended 
to three (and more) elements. Given x, y, z € A, then x * y € A, so we can 
form the product of this with z. Brackets are introduced at this stage, writing 
(x * y) * z to denote the result and to distinguish it from x * (y * z). Although 
the latter has x, y, z taken in the same order, it is the product of x and y * z, 
and might conceivably be different. For instance, (3 - 2) - 1 #3 - (2-1). 


Definition 5.27: If (x * y) * z = x (y * z) for all x, y, z € A, then * is 
associative. 
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The operations in examples (1), (2), (5), and (6) are associative, but those 
in (3) and (4) are not. 

When developing the number concept, commutative and associative bin- 
ary operations (such as addition and multiplication) are essential building 
blocks. Elementary algebra makes repeated use of these properties, but when 
we first encounter algebraic ideas the properties are seldom made explicit, 
because the algebraic symbols stand for numbers. In more advanced algebra, 
where the symbols can be far more general, commutativity and associativity 
may or may not be valid, so we have to pay proper attention to them. 

Just as we have binary operations f : A x A — A, we can go on to define 
ternary operations t : A x A x A — A, and so on. It is even possible to 
think of a function g : A — Aas a ‘unary operation’ to begin this hierarchy. 
Such concepts do arise from time to time, but they do not have the central 
importance of binary operations in mathematics. 


Indexed Families of Sets 


At the end of chapter 3 we considered sets S whose elements are themselves 
sets, such as S = {S),...,S,} where each S, is a set. Using the function 
concept, we can extend this notation. If N, = {1,2,...,}, then there is a 
bijection f : N, —> S given by f(r) = S,. There is no reason to restrict 
attention to N,, here. 


Definition 5.28: If A is any set, every element of S is a set, and f : A —> S 
is a bijection, then we say that S is an indexed family of sets, and write 


S = {Sq | a € A}. 


In this situation A is called the index set. 


The union 
(JS = {x | x € Sy for some a € A} 


of such an indexed family is alternatively denoted by 


U Sa 


acA 


and the intersection 


(S= {x | x € Sa for all a € A} 
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by 


(Sus 


acA 


When A=N,, these are often denoted by J S, and () S,. When A =N, 
r=1 r=1 


[ee] co 
the notation is sometimes written |} S,, (] S+. The ‘oo’ symbol in these ex- 


r=1 r=1 
pressions is part of the historical development of the subject; using modern 


set-theoretic notation, these are written as |] S, and ( S, 
reN reN 


Exercises 


In these exercises, any required properties of exponential, logarithmic, and 
trigonometric functions may be assumed without proof. 


1. Find the images of the following functions f : R > R: 
(a) f(x) =x° 
(b) f(x) =x-4 
(c) f(x) =x*+2x+2 
(d) f(x) = x? + cosx 
(e) f(x) = 1/x if x 0,f (0) =1 
(£) f(x) = Ixl 
(8) f(x) = x? + x - |x)? 

(h) f(x) = x" +x. 

2. For each of the functions f : R —> R defined above, state whether it is 
(a) injective, (b) surjective, (c) bijective. 

3. The following functions are to be defined so that their codomain is R, 
and their domains are certain subsets of R. Say in each case what the 
largest possible domain is. 

(a) f(x) = logx. 

(b) f(x) = log log cos x 

(c) f(x) = -x 

(d) f(x) = log (1 - x’) 

(e) f(x) = log(sin? (x)) 

(E) f) = e 

(g) f(x) =1/ (e - 1) 

(h) f(x) = s(x- 1)(x - 2) (x - 3)(x - 4)(x - 5)(x - 6) (positive square 
root). 


Find the image of f in each case. 
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. Let S be the set of circles in the plane, and let f : S —> R be defined by 
f (s) = the area of S. 


Is f injective? Surjective? Bijective? 

Now let T be the set of circles in the plane whose centre is the origin, 
R* = {x € R| x > 0}, and define g : T > R* by 

g (T) = the length of the circumference of T. 

Is g injective? Surjective? Bijective? 
. If A has two elements and B three, how many different functions are 
there from A to B? From B to A? How many, in each case, are injective? 
Surjective? Bijective? 
. If A has n elements and B has m elements (n, m € N), find the number 
of functions from A to B. 
. Show that if A=@,B#, then, according to the set-theoretic def- 
inition, there is precisely one function from A to B. Show that if 


A#@,B=2, there are none. How many functions are there from Ø 
to S? 


. Give examples of functions f : Z — Z that are: 
(a) neither injective nor surjective 

(b) injective but not surjective 

(c) surjective but not injective 

(d) surjective and injective. 


. Iff : A — B show that, for X C A and Y C B, the formulas 
F(X) = (f(x) |x € X}, 
f(Y) = {x EA] f(x) eY} 
define two functions Î : P(A) = P(B), f : P(B) —> P(A), where P(X) 
denotes the set of all subsets of X. Prove that for all X, X) C A and 
Yı, Y2 C B, 
(a) F(X U X2) = f(X1) Uf (%) 
(b) F(X1 N X2) € f(X1) N f (X2), but equality need not hold 
(c) f (YU Y) =f Y) UF (Y2) 
(d) f (¥19 Yo) =f (M1) OF (Y2). 
Can we improve (b) to equality if f is known to be surjective? Injective? 
Bijective? 
In textbooks the usual notation is f (X) =f (X), Í (Y) =f! (Y); for 
clarity we have used the notation above. 
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10. Define binary operations * on Z by 
(a) xxy=x-y 
(b) xxy=|x-y| 
(c) xxy=x+y+xy 
(d) xx y= 3(x+y + $((-1) +1) + 1). 
Verify that these are binary operations. Which are commutative? 
Associative? 
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CHAPTER 6 


Mathematical Logic 


herent way is the use of mathematical proof to deduce new results 

from known ones, building up a strong and consistent theory. These 
techniques include some that are unusual in everyday life. Perhaps the most 
interesting of them is the method of proof by contradiction (or ‘reductio ad 
absurdum’ as it was called in more classically oriented times). To show some- 
thing is true by this method, we assume that it is false and then demonstrate 
that this assumption leads to a contradiction. For example: 


T™ essential quality of mathematics that binds it together in a co- 


Proposition 6.1: The least upper bound of s = {x € R|x < 1}is 1. 


Proof: Certainly 1 is an upper bound. Let K be another upper bound. 
Suppose K < 1; then, by simple arithmetic, K < }(K + 1) <1. This means 
that K < $(K +1) € S, contradicting the fact that K is an upper bound. Thus 
the assumption K < 1 must be false, so K < 1, and 1 is the least upper bound. 


This is a typical case of this kind of reasoning. To analyse it more closely, 
let P stand for the statement ‘If K is an upper bound for S then K > 1’. The 
major part of the given proof is to establish the truth of P. We assumed P false 
(that is, there does exist an upper bound K < 1 for S) and a simple argument 
led to a contradiction. If the argument is correct, then P cannot be false—so 
it must be true. 

To carry through a proof of this nature, and to be certain of its validity, we 
must make sure of two vital ingredients. 

In the first place, the statement P (and all other statements in the course 
of the proof, for that matter) must be clearly true or clearly false, although 
at the time we may not always know which. In everyday conversation we 
meet comments like “Almost all drivers exceed the speed limit at some time 
or other.’ This sort of remark would be useless for a contradiction argument. 
To refute it, is it enough to find just one person who always obeys the speed 
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limit? Do we need to find a ‘substantial number’ (whatever that means) or 
even a majority? Everyday language is full of generalities that are vaguely true 
in most cases, but perhaps not all. Mathematical proof is made of sterner 
stuff. No such generalities are allowed; all the statements involved must be 
clearly true or false. 

The second essential factor in a proof by contradiction is that the argu- 
ments used in the course of a proof must have no flaws. Only if this is so 
can we be sure in a proof by contradiction that the false link in the chain of 
argument is the initial assumption: P is false. 

An old music hall joke goes something like this: 


COMEDIAN: You're not here. 
STRAIGHT MAN: Don’t be silly, of course I am. 
COMEDIAN: Youre not, and I'll prove it to you... Look, you're not 
in Timbuktu. 
STRAIGHT MAN: No. 
COMEDIAN: You're not at the South Pole. 
STRAIGHT MAN: Of course I’m not. 
COMEDIAN: If youre not in Timbuktu or at the South Pole, you 
must be somewhere else. 
STRAIGHT MAN: Of course I’m somewhere else! 


COMEDIAN: Well, if yowre somewhere else, you can’t be here! 


We are amused by this sort of thing, and we all see the logical flaw. But for 
beginners in mathematical proof techniques it exposes a deep-seated distrust 
of proof by contradiction. What if some similar ambiguity of terminology 
happens by accident or sleight of hand in the middle of the proof? When 
you were first confronted with a proof by contradiction that ./2 is irrational, 
were you convinced straight away that it was correct, without any degree of 
suspicion? Such distrust is fully justified, and the only way to allay it is to 
make sure that our mathematical logic is flawless. 

In the rest of this chapter we concentrate on the precise use of mathem- 
atical language and basic terminology in logic. In the following chapter we 
return to techniques of mathematical proof. 


Statements 


As we have just seen, it is essential that every statement in a mathematical 
proof is clearly either true or false. Typical instances are: 
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Examples 6.2: 


(i) 24+3=5. 
(ii) The least upper bound of a bounded non-empty subset of R is 
unique. 
(iii) There is an upper bound K for S = {x € R|x < 1} such that K < 1. 
(iv) ./2 is irrational. 


Here (i), (ii), and (iv) are true, but (iii) is false. In mathematics we are nat- 
urally more interested in true statements than false ones, but contradiction 
arguments make it convenient to allow both types of statement. 

To distinguish between true and false statements we say that each state- 
ment has a truth value denoted by the letters t or f, with the obvious 
interpretation of these symbols: t = true, f = false. Saying that a statement 
has truth value t is just a fancy way of saying that it is true. 

Given a statement P, the sentence ‘P is false’ is also a statement, and it 
has the opposite truth value to P. For example, if P is the false statement 
‘2+2 = 5’, then ‘2+ 2 = 5 is false’ is a true statement. In logical terminology 
‘P is false’ is usually written 


=P. 


This is also called ‘the negation of P’ and may be read simply as ‘not P’. It 
is a convenient shorthand notation; however, when an actual statement is 
substituted for P it may not read grammatically. In the above example, ‘not P’ 
would read ‘not 2+ 2 = 5’, which sounds peculiar. The equivalent statements 
‘2 +2 = 5 is false’ or ‘2 +2 4 5’ are more euphonious. When translating ‘not 
P into words, it is customary to rephrase it in a suitable way to make it read 
smoothly. 


Predicates 


A particularly important type of assertion in mathematics is the predicate, 
introduced in chapter 3. Recall that a predicate is a sentence involving a sym- 
bol, such as x, which is either clearly true or clearly false when we replace x 
by any element of a set X. For instance, a typical mathematical predicate is 
‘the real number x is not less than 1’. If we denote this by P(x), then P(2) is 
true, P(0) is false, P(zr/4) is false, and so on. If we find the truth value of P(a) 
for every a € R, we get a truth function Tp : R — {t, f} for which Tp(a) = t 
if P(a) is true, and Tp(a) = f if P(a) is false. 

This concept dovetails very nicely with our ideas about set theory. The 
predicate P(x) partitions R into two non-overlapping subsets, one containing 
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the elements for which P(x) is true, the other containing the elements for 
which P(x) is false. The first of these is denoted {x € R | P(x)}. For example 
{x € R|x > 1} is the set just described. The other is written {x € R | —P(x)}, 
which in the example becomes {x € R|x < 1}. 

This situation mirrors what happens in general. For any predicate P(x) we 
get a truth function as above. Then, for a € S, we have 


a € {xeS|P(x)} — ifand only if P(a) is true, 
a € {x e€S|—P(x)} ifandonlyif P(a) is false. 


Rather than using vague remarks like ‘a predicate is some sort of state- 
ment...’ we could use truth functions to give a set-theoretic definition. 
Suppose we define a truth function Tp on a set S to be any function 
Tp: S — {t, f}. Then we could propose the definition: ‘a predicate P(x) as- 
sociated with Tp is any sentence equivalent to “Tp(x) = t”. The only trouble 
with this approach is that predicates that appear different may have the same 
truth function. For example, 


Pi (x): “x is an upper bound for {s € R|s < 1}, 
P(x): ‘x > T. 


It is a major part of a mathematician’s job to show that such predicates are 
equivalent, or, more generally, that the truth of one implies the truth of the 
other. Therefore the predicates dealt with by practising mathematicians have 
the structure just described. Explaining this is a bit like explaining colour by 
pointing to something and saying ‘that’s blue’. A formal definition needs a 
lot to set it up; this would be appropriate in a formal course on mathematical 
logic, but it seems pointless here. 

If more than one variable occurs in a sentence, we talk about a ‘predicate 
in two variables’, or ‘three variables’, and so on. For example the sentence 
‘x > y is a predicate (which we will denote by Q(x, y)) in two variables. If real 
numbers are substituted for x and y then we get a statement. For instance, 
Q(3, 2) is true, but Q(7}, 10 + ,/2) is false. Here the truth function can be 
considered as 


Tg: RxR-— {t, f} 
where 
To(x, y) = t if Q(x, y) is true and T(x, y) = f if Q(x, y) is false. 


In the same way we can consider ‘x? + y? = Z as a predicate in three 
variables x, y, z € R which we denote by P(x, y, z). The truth function is 
Tp:RxRxR-— {t, f}, where 
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t ifx+y=z 
T(x yz) = | if Aay Jg” 

In practice, mathematicians do not always mention explicitly the set to 
which a predicate refers, assuming that it is implied by the context. For 
example, the predicate ‘x> 3 is evidently meant to apply to real num- 
bers x, whereas ‘n > 3’ refers to integers n. This follows from the standard 
convention that unless otherwise stated symbols x, y, z refer to real numbers. 

In particular, when we write ‘x > 3’ we assume that no one would dream 
of substituting something for x that doesn’t make sense. In the same way, 
it is a time-honoured convention that certain letters normally stand for 
elements from a specified set. For example, n is usually used to denote a nat- 
ural number, or perhaps an integer. In this context the predicate n > 3 
would be taken to refer only to natural numbers. We have already seen 
cases of this earlier in the book, for instance in the definition of convergence 
(Definition 2.7 on page 35) we wrote: 


A sequence (a,,) of real numbers tends to a limit l if, given any £ > 0, there 
is a natural number N such that jan - l| <eforalln>N. 


Nowhere in this definition do we actually mention that n is a natural number, 
but it is clearly implied by the context. In fact, since (an) is a sequence, n must 
be a natural number. 

There is a good reason for conventions of this kind, although at first sight 
they may seem a little sloppy stylistically. The more explicit we are in math- 
ematics, the more symbols we need. If making everything explicit is taken to 
ridiculous lengths, the page gets so cluttered with symbols that it gets difficult 
to read the overall meaning because of the mass of detail. It then becomes a 
question of judgement and mathematical style to select symbols that express 
the ideas as clearly and succinctly as possible. On some occasions it may be 
appropriate to ignore standard conventions. For example, in a given context 
it may be appropriate to use the letter x for an integer. 


All and Some 


Given a predicate P(x) that makes sense for elements in a set S, we can ask 
whether it is true for all elements in S, or whether it is true for at least some 
elements in S. We can then make the statements ‘for all x € S, P(x) is true’ or 
‘for some x € S, P(x) is true’. These statements can, of course, themselves be 
true or false. We write them in symbols using the ‘universal quantifier’ V and 
the ‘existential quantifier’ 3. 
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Vx € S: P(x) is read ‘for all x € S, P(x) 
dx € S : P(x) is read ‘there exists (at least) one x € S such that P(x)’. 


If the predicate P(x) is true for all x € S, then the statement Vx € S : P(x) 
is true; otherwise it is false. On the other hand, when P(x) is true for at least 
one x E S, then the statement 4x € S : P(x) is true, otherwise it is false. 

The symbols Vx € S : P(x) can be read as ‘for every x € S, P(x)’ or ‘for each 
x € S: P(x)’, or any grammatically equivalent way. Similarly, 3x € S : P(x) 
can be translated as ‘there is an x € S such that P(x)’, ‘for some x € S : P(x)’ 
and so on. 

In ordinary language there are subtle overtones in a statement like ‘some 
politicians are honest’. We get the message that some are honest, but we also 
tend to assume that some are not, because otherwise the statement would 
have been “all politicians are honest’. Mathematical usage carries no such 
implication. The statement ‘for some x € S, P(x)’ does not have the connota- 
tion that there exist certain other x € S for which P(x) is false. Consider the 
statement: 


‘some of the numbers 3677, 601, 19, 257, 11119, are prime’. 


Since 19 is prime, the statement is true. The other numbers are also prime, 
but this does not invalidate the conclusion. At the other end of the scale, 
‘some’ may mean only one; for instance 


‘some of the numbers 2, 3, 5, 7, 11 are even’ 


is also true, because 2 is even. This convention greatly simplifies the task of 
verifying the truth of “Ax € S : P(x)’. We need only find a single value of x 
for which P(x) is true. 


Examples 6.3: 


(i) Yx € R: x? > 0 means ‘for every x E R, x? > 0 or ‘the square of 
any real number is non-negative’, or some grammatical equivalent. 
This is a true statement. 

(ii) Ix € R: x? > 0 reads ‘for some x € R, x? > 0’ or ‘there exists a real 
number whose square is non-negative’. This is also true. 

(iii) Vx € R : x? > 0 is false (since 0? 0). 
(iv) dx € R : x? > 0 is true (since 1? > 0. In this case there are a lot of 
other elements of R besides 1 which would do just as well.) 

(v) dx ER: x? < Ois false. 
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If the symbol x is replaced throughout a quantified statement by another 
symbol, then we regard the new statement as being equivalent to the old. 


ax € S : P(x) means the same as dy € S : PCy). 


For instance, 3x € R:x* > 0 is equivalent to dy € R : y? > 0. Both statements 
say ‘there exists a real number whose square is positive’. 


More Than One Quantifier 


Given a predicate in two or more variables, we can use a quantifier for each 
variable. For example, if P(x, y) is the predicate “x + y = 0’, then the statement 
Vx € R dy E R: P(x, y) is read as ‘for every x € R there is a y € R such that 
x+y = 0’. It is standard logical practice to put all the quantifiers at the front of 
the predicate and read them in order; for instance, 3y € R Vx € R : P(x,y) 
reads as ‘there is a y € R such that for all x € R, x +y = 0. 

The order matters. Of the two statements given, Yx € R dy € R : P(x,y) 
is true, because for each x € R, we can take y = -x to get x + y = 0. However, 
dy € R Vx € R: P(x, y) is false, because it asserts the existence of y € R that 
satisfies x + y = 0 for every x € R. No single value of y will do. 

Getting the order of the quantifiers right in such a statement is a vital part 
of clear mathematical thinking. It is a common error to get it wrong (and 
not just among beginners). This problem can arise when we try to write a 
clear but formal logical statement in flowing prose. The word order may be 
changed around to give a more euphonious sound to the language, some- 
times at the expense of logical clarity. In particular the quantifiers may be 
embedded in the middle of the sentence instead of all coming at the be- 
ginning. We have already done this a few lines above when we wrote *.. . it 
asserts the existence of y € R which satisfies x + y = 0 for every x € R? 

Consider the statement “every non-zero rational number has a rational in- 
verse’. What we mean here is ‘given x € Q where x # 0 there is an element 
y € Qsuch that xy = 1’. This is, of course, true; if x = p/q where p, q are 
integers with p 7 0, then we can take y = q/p. Written in logical language, 
the statement becomes 


Vx e Q(x 0) 4x EQ: xy=1. 


A mathematician might change the order and say “There’s a rational inverse 
for every non-zero rational number’ to convey the same idea, even though 
this kind of statement could be misinterpreted. You can help matters by mak- 
ing sure that the meaning of your written mathematics is as clear as you can 
possibly make it. 
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The ambiguity only arises when the quantifiers involved are different. If 
they are the same, there is no such problem. For instance, given the predicate 
P(x, y): (x + y)? = xX + 2xy + y”, the two statements 


Vx € R Vy ER: P(x,y) 
and 
Vy € R Yx € R: P(x, y) 


both amount to the same thing: ‘for all x,y € R, (x + y)? = x? + 2xy + y”, 
which is of course a true statement. 

If the variables involved come from the same set, as in this case, we usually 
simplify the notation, writing Yx, y € R : P(x, y). The same happens with the 
existential quantifier. For instance if P(x, y) is “x, y are irrational and x + y is 
rational’, then 3x € R\ Q Sy € R\Q: P(x, y) and 3y € R\ Q Ax € R\Q: P(x, y) 
both say ‘there exist two real numbers x, y which are irrational but whose sum 
is rational’. (This is a true statement since ,/2 and -„/2 are irrational, but 0 
is rational.) It may also be written as 3x, y € R\Q : P(x, y). 

There is another minor pitfall in written mathematics. The universal quan- 
tifier is not always explicitly written; often it is implied by the context. Take 
another look at the definition of convergence of a sequence on page 34: 


A sequence of real numbers tends to a limit Lif, given any £ > 0, there is a 
natural number N such that |an -1| < £ forall n > N. 


This is quite a mouthful, and is often cut down to make it as brief as possible. 

A more precise definition should begin ‘for alle € R, € > 0 ...’. One of the 

little words that often gets lost is ‘all’. A typical shortened statement is: 
Given ¢ > 0, IN such that n > N implies |an- 1| < €. 

You will find a lot of minor variations on this definition, but in essence they 

all mean the same thing. If you understand this, you are a long way along 


the road to understanding the nature of the problem of communicating 
mathematics with the appropriate degree of precision. 


Negation 


On page 123 we introduced the negation —P of a statement P. The truth value 
of =P can be represented in the following table (called a truth table): 
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Reading along the rows, this says that when P is true, —P is false, and con- 
versely. The symbol — is called a modifier because it modifies a statement, 
changing its meaning and its truth value. 

In the same way, a predicate can be modified using —. If P(x) is “x > 5’, 
then —P(x) is ‘x > 5 is false’ or equivalently, ‘x # 5’. 

The negation of a statement involving quantifiers leads to an interesting 
situation. It is easy to see that the statement “Vx € S : P(x) is false’ is the same 
as ‘dx € S: aP(x)’. (Ifit is false that P(x) is true for all x € S, then there must 
exist an x € S for which P(x) is false, in which case —P(x) is true.) That is, 


(1) =Vx € S : P(x) means the same as 3x € S : —P(x). 
Similarly, 
(2) —dx € S : P(x) means the same as Vx € S : —P(x). 


Statement (2) tells us that ‘there is no x for which P(x) is true’ is the same as 
‘for every x € S, P(x) is false’. An example of (2) is: 


myx e€ R:x? <0... there is no x € R such that x? < 0. 


Yx ER: A(x? <0)... every x € R satisfies x £0. 


These two principles are vital in mathematical arguments. Freely trans- 
lated, (1) says ‘to show that a predicate P(x) is not true for all x € S, it is only 
necessary to exhibit one x for which P(x) is false’. Similarly, (2) asserts ‘to 
show no x € S exists for which P(x) is true, it is necessary to prove P(x) false 
for every x eS. 

As rules of thumb for negating statements involving quantifiers, these 
ideas come into their own when several quantifiers are involved. A typical 
instance is the definition of convergence of a sequence: 


Ve > 0IN EN Yn >N (jan -1| < £). 


To show that (a„) does not tend to the limit 1, we have to prove the negation 
of this statement: 


[Ve > 03N EN Yn >N (|an -1| <e)]. 


6 MATHEMATICAL LOGIC | 129 


Using principles (1) and (2) this becomes 
Je > O-[3N EN Yn >N (la, -1| < £)], 
then 
Je > 0VNEN-=[Vn >N (|an -1| <e)], 


then 


de >0YNE€ENIn>N-( an -l <e), 
which translates finally into: 
de >O0VNEN4An>N (la,-1|> €). 


Therefore, to verify that (a,,) does not converge to l, we have to prove that 
there is some specific € > 0 such that for any natural number N there is 
always a larger natural number n > N with | a, -1| > e. 

Much of the difficulty in a subject like mathematical analysis is in manipu- 
lating statements like this. Doing so becomes much easier with a little experi- 
ence and practice, keeping the principles for negating quantifiers in mind. 


Logical Grammar: Connectives 


In mathematics we give standard conjunctions ‘and’, ‘or’, and so on very spe- 
cific meanings. For instance, ‘or’ is used in the inclusive sense: if P, Q are 
statements then P or Q is a statement that is regarded as true provided that 
one or both of P, Q is true. We can represent this by a truth table: 


P Q PorQ 
t t t 
t f t 
f t t 
f f f 


This is read along the horizontal rows. For example, the second row says that 
if P is true, Q is false, then P or Q is true. 

Other conjunctions in regular use in mathematics are ‘and’, ‘implies’, and 
‘if and only if’. The symbols are & (and), = (implies), = (if and only if). 
They have the following truth tables: 
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ac) 
O 
ge 
% 
O 
ac) 
O 
ac) 
m 
O 
ac) 
O 
ac) 
t 
O 


SS 
Sh ee SH oe 
SS 
a h 
= = = 
SS> s 
Shot Sh oe 


These tables are read in the same way as the table for ‘or’. The first and last 
of these are fairly obvious: P & Q is regarded as true only when both P and Q 
are true; P <> Q is regarded as true only when P and Q each have the same 
truth value. 

The interesting table is the one for P > Q. If P is true, then the first and 
second lines say that the implication P > Q is true when Q is true and false 
when Q is false. This shows that the truth of P > Q means that if P is true, 
then Q must be true. This is the normal interpretation of the implication 
sign =>, and for this reason P > Q is often interpreted as ‘if P, then Q. 

What of the situation when P is false? The third and fourth lines say that 
whether Q is true or false, P = Q is always regarded as true. In many places 
a lot of philosophical nonsense is talked about this. “How can the falsehood 
of P imply the truth of Q? 

The reason for this situation can be seen in the standard mathematical 
practice of using connectives with predicates rather than statements. If P(x) 
and Q(x) are predicates both valid for x € S, then we can use the connect- 
ives in the manner above to get predicates P(x) or Q(x), P(x) & Q(x), etc. In 
particular, the predicate P(x) = Q(x) has the stated truth table. Sometimes 
P(x) > Q(x) is true for all x € S. This is where the truth table comes into its 
own. For example when P(x) is ‘x > 5’ and Q(x) is ‘x > 2’ then every math- 
ematician would agree that P(x) = Q(x) is true, although some would read 
this as ‘If x > 5, then x > 2’, they would not be interested in what happens 
when x > 5. 

Let us substitute some different values for x and see what happens: 


If x = 4, then P(4) is false and Q(4) is true. 
If x = 1, then P(1) is false and Q(1) is false. 


These are precisely lines three and four of the truth table for ‘< and il- 
lustrate how the truth table is arrived at. With this interpretation, the truth 
table can best be described as follows: 


‘P > Qistrue 
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means that 
(a) ‘If Pis true, then Q must be true’; 
however, 


(b) ‘If P is false then Q may be either true or false, and no conclusion 
may be drawn in this case’. 


Other connectives are possible; for example the “exclusive or’ (denoted 
here by OR) with truth table: 


P Q PORQ 
t t f 
t f t 
f t t 
f f f 


P OR Qis true when one, but not both, of P, Q is true. 

We could write down truth tables for many other connectives, but these 
can all be manufactured by combining the given ones. For instance, exclusive 
OR can also be described by (P or Q) & =(P & Q). We discuss these ideas in 
greater detail below in the section on Formulas for Compound Statements. 

Mathematicians do not restrict themselves stylistically to the connectives 
just described. They may also use grammatical connectives like ‘but’, ‘since’, 
or ‘because’, as fancy takes them. These words are interpreted as grammat- 
ical equivalents for the technical words. For instance the truth table for ‘P 
but Q is the same as that for ‘P & Q. The statement ‘,/2 is irrational but 
(,/2)? is rational’ means the same thing as ,/2 is irrational and (,/2)* is ra- 
tional’. Similarly, ‘P because Q’ and ‘P since Q’ have the same truth table as 
‘Q = P. You can make yourself familiar with these variants by looking at a 
few examples. (See the exercises at the end of the chapter.) 


The Link with Set Theory 


If we apply connectives and the modifier — to predicates in one variable, we 
find a simple relationship with set-theoretic notation. Suppose P(x) and Q(x) 
are predicates valid on the same set S, and look at the subsets for which the 
various compound statements are true. For “& we obtain: 
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{x € S| P(x) & Q(x)} = {x € S| P(X)} N {x € S| Q(x} 


Q 


P(x) & Q(x) true 
Fig. 6.1 P(x) & Q(x) 


Similarly, 


{x € S| P(x) or Q(x)} = {x € S| P(x)} U {x € S| Q(x)}. 


Q 


P(x) or Q(x) true (shaded) 


Fig. 6.2 Inclusive P(x) or Q(x) 


This is one reason for using the ‘inclusive or, corresponding to set- 
theoretic union, rather than the ‘exclusive OR’ which corresponds to the 
‘symmetric difference’ in set theory, represented by the shaded area in the 


diagram: 


but not both 
Fig. 6.3 Exclusive P(x) OR Q(x) 
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The modifier — applied to a single predicate P(x) corresponds to the set- 
theoretic complement: 


7 P(x) true 


Fig. 6.4 Not P(x) 


When we consider the implication P(x) = Q(x) we look at the situation a 
little differently. We are really interested only in the case where P(x) > Q(x) 
is true for all x. 

In this case, if P(x) is true, so must Q(x) be; that is, ifa € {x € S| P(x)} 
then a € {x € S| Q(x)} which means {x € S|P(x)} € {x € S|Q(x)}. The 
truth of the statement P(x) => Q(x) for all x € S corresponds to set-theoretic 
inclusion: 


Q(x) true 


Fig. 6.5 P(x) > Q(x) 
In the same way, P(x) Q(x) is true for all x € S if and only if 
{x € S| P(x)} = {x € S| Q(x}. 


Formulas for Compound Statements 
Using connectives and modifiers we can form more complex statements and 


predicates from given ones, for instance (P & Q) or R. This involves three 
statements, so the truth table has 2? = 8 lines: 
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Intermediate 


calculation 
P Q R PRQ (P&Q)orR 
t t t t t 
t t f t t 
t f t f t 
t f f f f 
f t t f t 
f t i f Í 
f f t f t 
f f f f f 


The symbol “(P & Q) or R is really a recipe for forming a new statement 
or predicate out of three given ones P, Q, R. To emphasise this, we call it a 
compound statement formula when P, Q, R stand for unspecified statements 
or predicates. When we replace P, Q, R by specific statements, for instance 


(2>3&2>6)or2>1, 


we call it a compound statement. If instead we use specific predicates, we call 
the result a compound predicate. For example, 


(x > 3&x > 6) orx > 1 


is a compound predicate. 

Most mathematical proofs involve manipulating compound statements 
and predicates. Brackets are often essential to show how these statements 
and predicates are constructed. For instance P & (Q or R) is different from 
(P & Q) or R. In fact, looking at the seventh line in the above truth table, if P is 
false, Q is false, and R is true, then (P & Q) or R is true; but a calculation shows 
that in this case P & (Qor R) is false. The same goes for predicates. So we 
must take care to put the brackets in the right places whenever ambiguities 
would arise. 

Sometimes, however, it is permissible to omit brackets. For instance, 
(P & Q) &R has the same truth table as P &(Q & R), so it would not cause 
any problem to write just P XQ &R. 

When we build up compound statement formulas using connectives and 
modifiers, we often find formulas that look different but have the same truth 
table. An example is the two statements P > Q and (~Q) => (=P): 
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t t t 
t f f 
f t t 
f f t 
Intermediate 
calculations 
P Q =P =Q (~Q) => (-P) 
t t f f t 
t f f t f 
f t t f t 
f f t t t 
We can summarise the result as 
P Q (=Q) => (-P) 
t t t 
t f f 
f t t 
f f t 


and the final column for both formulas is tf tt. 
In this case the compound statement formulas are said to be logically 
equivalent. Denoting two compound statement formulas by S1, S2, we write 


for logical equivalence. For instance, our result above can be expressed as 
P > Q = (~Q) => (>P). 


Sometimes two compound statement formulas can be considered to be 
logically equivalent, even though they are composed of different symbols. 
This happens when changing the truth value of a particular symbol does 
not affect the final result. For instance, P & (—P) is always false. The truth 
table for (P & (—P)) or (~Q) has the same truth value as (~Q), regardless of 
the truth value of P. One way of looking at this, typical of the mathematical 
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fraternity, is to think of (~Q) as a function of both P and Q, so that its truth 
table becomes 


P Q (P & (=P)) or (=Q) 
t t f 
t f i 
f t f 
f f t 
P Q =Q 
t t f 
t f t 
f t f 
f J t 


By this formal device we can legitimately write 


Q = (P & (=P)) or (~Q). 


Definition 6.4: A compound statement formula is a tautology if it is true 
regardless of the truth values of its constituent statement symbols. 


Typical examples of tautologies are: 


(i) Por (=P) 
(ii) P > (P or Q) 
(iii) (PRQ) => P 
(iv) (P > Q) @ (=Q) = (-P)). 


Check that the truth tables for these always yield the value t. 


Definition 6.5: Ifa compound statement formula always takes truth value 
f, regardless of the truth values of its constituent statement symbols, then it 
is a contradiction. 


For example, P &(—P) is a contradiction. 

Any two tautologies are logically equivalent, and any two contradictions 
are logically equivalent. Moreover, a compound statement formula S is a 
tautology if and only if S is a contradiction. 
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It is useful to use the symbol T for a tautology and C for a contradiction. 
We then get some interesting results. For example, (~P) = C is logically 
equivalent to P. The truth tables are 


P (-=P) >C P P 
t t t t 
f f f f 


When calculating the first table, remember that C always has truth value f. 
If we replace C by a specific contradiction, say Q & (~Q) , we will still get 
the same result: 


P Q (P) = (Q& (=Q)) 
t t t 
t f t 
f t Í 
f Í Í 
P Q P 
t t t 
t f t 
f t f 
f Í Í 


You should check all the intermediate calculations in the first table to get a 
feeling for what is happening. 

Instead of comparing the truth tables of two compound statement formu- 
las Sı and S; to check logical equivalence, we can look at the single table for 
Sı = So. If Sı is logically equivalent to S2, then Sı + S is a tautology, and 
vice versa. For example, the logical equivalence of P = Q and (~Q) => (=P) 
corresponds to the fact that [P > Q] + [(=Q) = (—P)] is a tautology. 


Logical Deductions 
The overall strategy behind a proof often arises by proving not the truth 


of a given statement, but the truth of a logically equivalent one. Important 
examples are as follows. 
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Examples 6.6: 


(1) The contrapositive P > Q = (=Q) = (—P). To prove P > Q, we 
establish the truth of (~Q) > (-P). 

(2) Proof by contradiction P = (~P) = C where C is a contradiction. To 
prove P, we establish the truth of (~P) > C. 

(3) Tf and only if’ proof P & Q = (P = Q)&(Q = P). To prove 
PS Q, we prove both P => Qand Q => P. 

(4) Tfand only if’ version II P 4+ Q = (P = Q) &((—P) = (~Q)). 
To prove P + Q, we establish the truth of both P > Q and (=P) > 
(œQ). 


We establish the truth of a given statement from known ones by seeing 
how the new statement is made up, and using truth tables. For instance, we 
might know that P is true and that (~Q) = (—P) is true. From these facts we 
can deduce that Q must be true. The given statements might be compound 
ones, like (~Q) = (—P), and although we know that the total statement is 
true, we may have no information on the truth of its constituents. Thus we 
might know that (~Q) = (—P) is true, but have no knowledge of the truth 
values of P or of Q. This still allows us to make some deductions; for example 
if (~Q) = (—P) is true, then we know that the equivalent statement P > Q 
is true. 

Here are a few situations in which we can deduce the truth of the statement 
in the second column from those in the first. 


If these statements are true... ... then this must be true 
P, (~Q) = (>P) Q 

(=P) => C (contradication) P 

P,P>Q Q 

P> Q, Q>R P>R 

PorQ, =P Q 

P&Q PorQ 

P>QQ=>P PQ 

PissssaPa Pi &...& Py 


Pi ..., Pa (Pi &...&P,) >Q Q 


This table can be continued indefinitely. To obtain a new entry, write a 
number of compound statement formulas S,,...,S,, in the left-hand col- 
umn. In the corresponding position in the right-hand column, put any 
compound statement formula D whose truth is ensured when Sj,...,S, are 
true. This involves looking at the truth tables for S),...,5,,D, but we can 


6 MATHEMATICALLOGIC | 139 


formulate the condition in one composite table by considering the formula 
(S:&...& Sa) => D. If this is a tautology then the truth of S1, . . . , Sn ensures 
the truth of D, as required. 

A tautology of the form (S,&...&S,) = D is called a rule of inference. 
Given such a rule of inference, and substituting actual statements into the 
compound statement formulas involved, if S1, ..., Sn are true then we may 
infer that D is true. 

When the statements concerned involve quantified predicates we must 
look at how they are composed to see if the truth of one statement is a natural 
consequence of given ones. In a simple case we might know that Vx € S: P(x) 
is true, and infer that 4x € S: P(x) is also true. Given the truth of Vx € S: P(x) 
and Vx € S: Q(x) we can deduce a whole host of statements, including 


Vx € S: P(x) & Q(x), 
[Vx € S: P(x)] or [Vx € S: Q(x)], 
P(a) & Q(b) where a, b € S, 


and so on. Again, we can make a list of deductions that can be made from 
statements involving quantified predicates. 


If these statements are true... ... then this must be true 
Vx € S: P(x), Yx € S: Q(x) Vx € S : [P(x) & Q(x)] 
Vx € S: P(x) dx € S: P(x) 

Vx € S: P(x) P(a) (a € S) 

P(a) (a€ S) dx € S: P(x) 

=[Yxe S: P(x)] dx € S: [>P(x)] 

[Fx eS: P(x)] Vx € S: [AP(x)] 


dx € SYy € SWz € M : = |P(x, y, 2)] a [Vx € Say eSazeM: P(x, y, 2)| 


Again, this list could easily be extended. In the left-hand column we put 
statements Sı, . . . „Sn, which may involve quantified predicates; in the right- 
hand column we put a statement D whose truth follows when all of S1, . . . „Sn 
are true. Again this can be formulated as the requirement that the single 
statement (S,&... &S,,) > D must be true. 

In this book our main aim is not to proliferate more and more complex 
logical statements, it is to seek simpler ways of expressing complicated ideas 
to make proofs easier to read and write. 
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Proof 


In practice, when seeking a proof of some mathematical statement, we start 
from a number of statements H4, . . ., H, (called the hypotheses) and attempt 
to deduce the truth of a statement D. The process may become quite in- 
volved, with the introduction of other subsidiary statements. For this reason 
we perform the process in a number of steps by writing down a sequence of 
statements L1, . . . , Ln, where L, = D and each L,, is either one of the hypoth- 


eses H,,...,H;, or a statement whose truth can be deduced from the truth 
of Li, ...,Lm-1 for each m = 1, 2,...,n. Therefore Lı must be one of the 
hypotheses, and each successive statement L2,...,L, must either be a true 


deduction from previous L; or a hypothesis. These conditions clearly imply 
that the last line D is true. 

The truth of the deduction of Lm from previous Lj is checked, as be- 
fore, by verifying the truth of the statement (Lı &...&Lm-1) => Lm. If 
Lm is a hypothesis, it follows immediately from the truth table for > that 
(Li &...& Lm-1) = Lm is true; but when Ln is not a hypothesis we need to 
check more fully. 

When the final deduction D is of the form P > Q, mathematicians often 
vary the prescription by writing down lines L4, . . ., Ln with P as the first line 
Lı and Q as the last line L,,. Here each intermediate line must either be a 
hypothesis, or its truth must follow from previous lines, as before. Some 
lines may well be predicates; again the important thing is to ensure that 
(Ly &...&Lm-1) => Ln is always true. 


Example 6.7: Given hypotheses 


Hı: 5>2 
Hz: Yx, y, z € R: (x > y)&(y > z) > (x > z), 


we can write down the proof that (x > 5) => (x > 2) as 
L:x>5 
Ly: 5>2 


L3: Vx, y, z E R: (x > y)&(y > z) > (x >z) 
Lg x > 2. 


Although this particular deduction is not exactly mind-boggling, it em- 
bodies the general prescription for a proof, which we crystallise as follows: 
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Definition 6.8: Let P and Q be statements or predicates. A proof of the 
statement P = Q, given the statements H),...,H;, consists of a finite 
number of statements 


Li=P 
L 
L, = Q 


where each L,, (2 < m < n) is either a hypothesis H; (1 < s < r) or a state- 
ment or predicate, such that 


(Li & see & Lima) > Lm 


is a true statement for all m < n. 
We call the L; the lines of the proof. 


Under these conditions, if P is true, then each succeeding line must also be 
true, so in particular Q is true. The truth table for > then shows that P > Q 
is true. 

It is worth looking at what happens when P is false. This could easily 
occur if P is a predicate, when substituting a particular value for the variable 
renders the predicate false. Thus in the above example, x > 5 is false when 
x = 1, in which case line L4 becomes 1 > 2, which is also false. On the 
other hand, if x = 3 then L; is false, but L4 is 3 > 2 which is true. In short, 
if P is false we can draw no conclusions about the validity of succeeding 
statements L;: the only thing we are certain of is that the compound state- 
ment P = Q is true. This happens because, although we know that the 
deduction (Lı &... &Lm-1) = Lm is true, the falsity of Lı = P can lead to the 
falsity of Lm. 

This is the most important factor in proof by contradiction. Such a proof 
has exactly the same format as the one above. To establish P we prove an 
equivalent statement (~P) = C, where C is some contradiction. So we start 
with the first line Lı as (~P), and end up with the last line L, being C. On 
the assumption that (—P) is true, each succeeding line must also be true. But 
L, is manifestly false, being a contradiction. Hence (=P) cannot be true, so P 
must be true. In this way we establish the truth of P ‘by contradiction’. 

The proof of P = Q by establishing the logically equivalent statement 
(~Q) = (~P) also has the same basic structure, starting with the line (~Q), 
and ending with the line (~P). 
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This is the formal definition of the logical steps in a proof. What do we 
actually need to write down in practice? The next chapter provides a possible 
answer. 


Exercises 


1. Write down truth tables for the following compound statements: 
(a) P= (>P) 
(b) ((P = R)&(Q = R))  (P&Q) = R) 
(c) P&Q) = (PorQ) 
(d) (P = Qor(Q = P) or (=Q). 
Which are tautologies? 
2. Write the following statements using quantifiers Y, 4 and state which 
of them are true: 
(a) For every real number x there exists a real number y such that 
y =x. 
(b) There exists a real number y such that for every real number x, the 
sum x + y is positive. 
(c) For each irrational number x, there is an integer n satisfying 


x<n<x+l. 


(d) The square of every integer leaves remainder 0 or 1 on division 
by 4. 

(e) The sum of the squares of two prime numbers which are not equal 
to 2 is an even number. 


3. Translate the following statements into prose: 
(a) Vx ER Jy €R: x? -3xy+2y =0. 
(b) dy € R Yx ER: x? -3xy + 2y =0. 
(c) IN ENVe ER: [(e >0)&(n > N)] > (1/n < €). 
(d) Ve eNVye NAZEN:x+z=y. 
(e) Ve EeZVyeEZAzeZ:x+z=y. 
Read your translations carefully, and if you think they sound stilted, 
rewrite them in a more flowing style (but don’t change their meaning!). 
State which of (a)-(e) are true and which are false, giving a reason in 
each case. 


4. In each of the following cases, write out truth tables and say whether 
the two statements are equivalent or not. 
(a) = [P & (-P)], Por (>P) 
(b) P> Q, CP)&Q 
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(c) P> Q, (-Q)&P 

(d) (P> Q)&R, P > (Q&R) 

(e) [P & (~Q)] = [R & (~=R)], P => Q. 

Use truth tables (where possible) to verify the rules of inference listed 
in the section ‘logical deductions’. 


Which of the following are logically correct deductions? 

(a) If an International Weapons Limitation Agreement is signed, or 
the United Nations approve a disarmament plan, then shares in 
the arms industry will slump. But armament shares will not slump, 
so an International Weapons Limitation Agreement will not be 
signed. 

(b) If Britain leaves the European Union or if the trade deficit is re- 
duced, the price of butter will fall. If Britain stays in the European 
Union exports will not increase. The trade deficit will increase 
unless exports are increased. Therefore the price of butter will 
not fall. 

(c) Some politicians are honest. Some women are politicians. There- 
fore some women politicians are honest. 

(d) If I do not work hard I will sleep. If I am worried I will not sleep. 
Therefore if I am worried I will work hard. 


Consider the statement 

x<y buty>z 
in the following cases: 
(a)x=ly=2,z= 
(b) x=ly=2,z= 
(c) x =2,y=1,z= 
(d)x=2,y=1,z 
Which cases yield a true statement? Use this information to draw up a 
truth table for ‘but’ and check that 


(Pbut Q) > (P &Q) 


w OW Oo 


is a tautology. 
Do the same for ‘since’ and ‘therefore’ and compare with ‘implies’. 
What happens for ‘unless’? 


What are the negations of the following statements? 
(a) Vx : (P(x) & Q(x) 

(b) Ax: (P(x) > Q(x) 

(c) Vx ERAyER: x>y 

(d) Ve eRVyERAzEQ:xt+y>z. 

Are (c) and (d) true or false? 
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9. Prove by contradiction the following theorems: 


10. 


11. 


(a) Ifx, ye Randy < x+ e forall > 0 (e € R), then y < x. 

(b) For all real numbers x, either 4/3 + x is irrational or 4/3 - x is 
irrational. 

(c) There is no smallest rational number greater than „/2. 


Consider the connectives —, &, or, >. Show that 


P>Q=(-P) orQ 
PorQ= ~= [(~P) & (~Q)] 


and hence that any compound statement can be written in terms of 
the connectives —, & alone. Is it possible to write every compound 
statement in terms of just one of the connectives —, &, or, =>? 

Define the stroke connective | by the truth table 


Ẹ Q P|Q 
t t f 
t f t 
f t t 
f f t 


and show that 
P| P = (>P) or (~Q). 


Show further that 
(a) (~P) =P|P 
b) (P&Q) = (P| Q)|(Q|P) 
(c) (Por Q) = (P|P)|(Q|Q) 
(d) (P >Q) =P] (QIQ). 
Hence deduce that any compound statement may be written using only 
the stroke connective. 
Remark: this may be economical in terms of connectives, but is 


(((P1P) 19) I ((P 1P) 19))1 (Q19) 


easier to read than ((~P) & Q) = Q? They are equivalent... 


Look back at your responses to the questions at the end of the first 
chapter and reflect on any change in sophistication that has occurred. 
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CHAPTER 7 


Mathematical Proof 


n the last chapter we looked at the logical use of language in mathemat- 
[« and how the truth of a statement can be deduced from given ones. 
We showed how a proof may be thought of as a sequence of logical 
deductions. In practice this formal definition of ‘proof’ does not provide 
a satisfactory way of writing proofs: to include every single step leads to 
a stereotyped format and is usually unbearably long-winded (see [8]). In 
this chapter we look at how proofs are actually written by practising math- 
ematicians. In addition to the underlying logical skeleton, the writing of 
mathematical proofs needs a sense of judgement about how much detail is 
appropriate: what must be included and what may safely be left out. Too little 
detail may omit vital portions of the argument; too much may obscure the 
overall story. 
We begin by taking an actual proof, written in normal mathematical style, 
and comparing it with the formal structure of the previous chapter. 


Theorem 7.1: If(a,), (bn) are sequences of real numbers such that a, —> a 
and b, —> bas n —> œ, thena, +b, > a+b. 


Proof: Let > 0. Since a, — a there exists N; such that 
n > N, > |an -aļ| < ze. 
Since b, — b there exists Nz such that 
n > N, => |b, -b| < Se. 


Let N = max(N,, N2). Ifn > N then |a, - a| < }¢ and |b, - b| < $e, so, by 
the triangle inequality, 
|(an + bn) - (a + b)| < |an - a| + |b, - b| 
< e+ ie 


=E. 


Hence a, + b, —> a + b, as required. 


146 | 7 MATHEMATICAL PROOF 


To analyse the structure of this proof, let us break it down line by line, 
adding a few words here and there to make the construction clearer. 

First look carefully at the statement of the proof and note the hypotheses 
that are given and the consequence that is to be proved. The theorem is in 
the form P > Q where the P involves two given hypotheses: 


Hypotheses: 


Hy. (an) is a sequence of real numbers and a, — a. 
H2. (bn) is a sequence of real numbers and b, — b. 


The consequence Q to be proved states that: 


Consequence to be Proved: Q: a, +b, > a+b. 


The proof, as given, consists of the following lines: 


Proof: 
Lı. Lete > 0. 
Ly. Since a, — a there exists N; such that n > N, > |an -a| < $e. 
L;. Since b, — b there exists Ny such that n > N3 > |b, -b| < je. 


Ly. Let N = max (Nj, N3). 
Ls. Ifn > N then |a, - al < je and |b, - b| < le. 
Ls. So |(an + bn) - (a + b) < |an - a| + |b, - b| by the triangle inequality. 
Ly. |an -a| + |b, -b| < Se + Se. 
Lg. je + je =E. 
Lo. (There exists N = max (Nj, N2) such that) 
n>N => |an + bn) - (a+ b)| < e. 
Lio. an + by —> a + b, as required. 


Lines L, and L, are the definition of the limit a,, —> a while lines L, and L, 
are the definition of the limit b, —> b. These involve the implicit step that if 
€ > Othen Se > 0 so łe can be used in the definition of limit. In principle 
we ought to write out these short deductions explicitly, but in practice the 
steps are omitted when they are known parts of our technique. 

Line L4 is something new: a definition of the symbol N in terms of N, 
and N2. This definition could be omitted if we desired, and each occurrence 
of N in the proof replaced by max(Nj, N2) without any real change in the 
proof. In practice, however, it is common to use new symbols to stand for 
complex concepts built up from known ones, in order to keep the notation 
looking simple. 
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Line Ls follows from L2, L3, and L4, although the statement that n > N 
implies n > N; and n > N3 is taken for granted. 

Line Le subsumes some simple algebraic manipulations rearranging 
|(an + bn) - (a + b)| to give |(a, - a) + (bn — b)|, before using the triangle in- 
equality to get the final result. This statement looks like a predicate in n, but 
tacitly we regard it as coming under the implied quantifier Yn > N in Ls. 

Line L; follows from L; and Le, again using an implicitly understood 
algebraic result, this time addition of inequalities. 

Line Lg is trivial algebra. 

Line Ly follows from lines L, to Lg. This is precisely the formal definition 
of the convergence of a, + b, to a + b, which gives the final conclusion in Lj. 

This analysis shows that mathematicians do not write out proofs in pre- 
cisely the manner described in the previous chapter. Steps are omitted, both 
when hypotheses are introduced and when deductions are made; new def- 
initions are brought in; the whole package is wrapped up in a flowing prose 
style in total contrast to a formal sequence of statements. 

Why is this? In the first place, mathematicians were writing proofs long 
before they were logically analysed, so the prose style came first and con- 
tinues to be used. The main reason is that the omission of trivial detail and 
the use of new symbols for complicated constructions are part of the process 
of attempting to make the deductions more comprehensible. The human 
mind builds up theories by recognising familiar patterns and glossing over 
details that are well understood, so that it can concentrate on the new ma- 
terial. In fact it is limited by the amount of new information it can hold at 
any one time, and the suppression of familiar detail is often essential for 
a grasp of the total picture. In a written proof, the step-by-step logical de- 
duction is therefore foreshortened where it is already a part of the reader’s 
basic technique, so that they can comprehend the overall structure more 
easily. 

When working out a new theory, practising mathematicians tend to dis- 
tinguish between well-established facts that are part of their technique, and 
those that are in the new material they are developing. They take the estab- 
lished ideas very much for granted, telescoping several steps into a single 
line where their technique is fluent, often without giving explicit references 
to where a proof of these results can be found. This is done in the confidence 
that, if they were challenged, they would be able to fill in the details (though 
it might tax their memory to recall them straight away!). 

Newly established results constitute the heart of the theory that is being 
developed, and are therefore treated with greater care. They will be stated 
clearly as hypotheses when they are needed, and references to their proofs 
will be given. 
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When to omit logical steps or references in a proof, and when to give 
them in full, is part of that elusive quality: mathematical style. Different 
mathematicians will differ in their opinion. The clue is to look at the con- 
text of a proof and see for whom it is intended. Thus the present reader 
is most probably a student whose experience comes mainly from explana- 
tory textbooks and lectures. Here the balance is likely to lean towards fuller 
exposition. On the other hand communication between two experts might 
comprise very sketchy outlines concentrating on the important new details. 
Nevertheless, both extremes have in common the feature of omitting de- 
tail when it may be considered as part of the basic technique in the given 
context. 

As a specific example, when studying analysis, the rules of arithmetic 
might be subsumed as part of the basic toolkit, but new ideas such as lim- 
its, continuity, and so on would be treated with greater respect. Theorems 
about these new ideas would be proved carefully, and later theorems would 
refer back explicitly to results established earlier on. In the proof above, arith- 
metical results were used without comment, though the triangle inequality 
was mentioned because it was felt to be sufficiently new to be worth re- 
minding the reader about. In more advanced work the triangle inequality 
would become part of the underlying technique, and be used without special 
reference. 

In principle, a proof as used by practising mathematicians has an under- 
lying structure like the one described in the previous chapter, but the proof 
occurs in a context where certain results have become a standard part of the 
technique. So a proof of a statement D from explicit hypotheses Hi, ..., Hy 
now consists of a number of statements L4, . . . , Ln, where L, is D, and each 
Lm is either: 


(i) a known truth, which is either a simple deduction from the hypotheses 
or from the contextual technique, 


or 


(ii) a deduction from the previous lines L4, ..., Lm-1 using formal logic 
and known truths from the contextual technique. 


The proof is written in a mixture of prose and mathematical symbolism that 
makes the logical structure clear. Steps are omitted if the deductions are clear 
from the context, and new symbols may be introduced to simplify nota- 
tion. Similarly, an actual proof of a statement P => Q will have the same 
underlying structure as the formal, logical one, but making tacit use of the 
contextual technique. 
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Sometimes the context may be so clear that no explicit hypotheses are 
mentioned. For example: 


Theorem 7.2 (Euclid): There exist infinitely many primes. 


Proof: Suppose that there exist only a finite number of primes, say pj,..., 
Pn. The number 


N=1+p1..-pn 
is divisible by some prime p. But p cannot be any of pi, ..., Pn since the 
latter all leave remainder 1 on dividing N.This contradicts our assumption 
that pi,...; Pn is the complete list of primes. 


This proof depends on the context of arithmetic of whole numbers, includ- 
ing factorisation of numbers into primes. It is given in the form of a proof 
by contradiction. Let P be the statement ‘there exist infinitely many primes’. 
The first line is ‘suppose —P’, and the proof thereafter follows the usual line 
of argument in search of a contradiction. Then —P must be false, so P must 
be true. 

In a proof like this we must keep a careful eye on our contextual mater- 
ial for the presence of logical flaws in the parts of the proof that have been 
omitted, as well as on the actual symbols on the paper. For instance, what is 
wrong with the following ‘proof’? 


Theorem(?) 7.3: The largest integer is 1. 


Proof (?): Suppose not. Let n be the largest integer. Then n > 1. Now n? is 
also an integer, and n? > n x 1 = n. So n? > n, which contradicts n being the 
largest integer. Therefore our initial assumption is false, so 1 is the largest 
integer. 


Where is the flaw? Think about it before reading on. 

The flaw is in the statement ‘Let n be the largest integer’. This is not the 
correct negation of ‘1 is the largest integer’. It should be ‘1 is not the largest 
integer’, which pulls apart into ‘n > 1 is the largest integer or there is no largest 
integer’. With this statement substituted, the contradiction fails to material- 
ise, since n? > n does not contradict the phrase italicised above. This logical 
flaw is disguised by the informality of the proof. It needs a lot of experience 
to avoid traps like these. 
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Axiomatic Systems 


To provide a firm basis for the contextual material used, we must start some- 
where. We do this by taking certain explicit statements as axioms, which are 
assumed to be true; all other results in the theory are deduced from these. Ac- 
cording to taste, these deductions are called theorems, propositions, lemmas, 
corollaries, and so on. The words ‘theorem’ and ‘proposition’ are often re- 
garded as being interchangeable, some authors sticking exclusively to one 
or the other. We prefer to use the word ‘proposition’ to describe an ordinary 
run-of-the-mill result, reserving ‘theorem’ for something more important. In 
this way the structure of the theory can be seen more clearly, with important 
theorems standing out in relief from the background of propositions. 

To give even more shape to the contours of the theory, and to reduce 
the strain in particularly long proofs, constituent parts of a proof may be 
separated out and proved before using them in the proof of a theorem or 
proposition. Such a preliminary result is called a lemma. There may be sev- 
eral lemmas preceding a major theorem, so that when the proof of the main 
result is reached, all the spadework has been done, and all that is left is to put 
the pieces together. In this way it is possible to make the proof of the theorem 
itself a much more streamlined affair, with its salient features clearly delin- 
eated and not concealed by the details, which have been subordinated to the 
lemmas. 

The complement of a lemma, which precedes a theorem, is a corollary, 
which follows it. A corollary is a result that can be deduced very simply from 
a theorem (or proposition) and immediately follows it. Sometimes the proof 
is so obvious, because of the context, that the proof of a corollary is omitted, 
or ‘left to the reader’. 

In chapter 2 we looked at intuitive ideas of the real numbers and proved 
results in that context. To treat the subject formally, we will have to select 
certain properties of arithmetic as basic axioms and then build logically on 
these. (If we are sensible, we will use all the guile we have developed in in- 
tuitive arithmetic to suggest to us which way we should go in our formal 
development.) In chapter 8 we will look at suitable axioms for the natural 
numbers, before moving on to other number systems. Once we have a firm 
foundation for arithmetic, we can use it as contextual material to go on to 
more advanced theories. When handling vector spaces, or analysis, or geom- 
etry, arithmetical results can be subsumed and we can concentrate on the 
next level of deduction. At each stage it must be made clear which type of 
result may be used without comment, and which must be documented care- 
fully. Sometimes the author of a textbook, or a lecturer, may fail to mention 
this explicitly because it is inherent in the context. 
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Proof Comprehension and Self-Explanation 


The question now arises: how can you make sense of a proof when the writer 
of the proof has made various stylistic decisions to express the essential ideas, 
but has glossed over details that are assumed to be implicitly well known? 
The answer is that when you read a proof, it is essential to consider each line 
in turn and to explain fo yourself why each successive line is justified. This 
process is called self-explanation and you can practise what it means by work- 
ing through the appendix at the end of this book. It involves mentally giving 
a mathematical reason why each successive line follows from earlier ones. 
You can do this silently in your own mind, or by making written comments 
on the page. It requires more than simply repeating or paraphrasing what is 
written in the proof. By making a mental effort to justify the status of each 
successive line in a proof, you are much more likely to make firmer links in 
your brain than you would by passively reading one line after another. 

To see self-explanation in the context of this chapter, consider the follow- 
ing proof that was given earlier: 


Theorem: If (a,,), (bn) are sequences of real numbers such that a, —> a and 
b, > basn > œ, thena, +b, > a +b. 


Proof: Let £ > 0. Since a, — a there exists N; such that 
n>N, => |a,-al < le. 
Since b,, — b there exists N, such that 
n > N, > |b, -b| < że. 


Let N = max (N1, N2). If n > N then |a, - a| < }¢ and |b, - b| < $e, so, by 
the triangle inequality, 


(an + bn) - (a + b)| < 


A 
Ss 
3 
| 
D 
+ 
~ 
3 
| 
= 


IA 
| 
m 
+ 
l 
m 


Hence a, + b, > a + b, as required. 


Read each line in turn and explain to yourself why the successive lines can 
be justified. Why does the first line simply state “Let € > 0? (The reason 
relates to the fact that you are asked to prove that a, + b, —> a+ b. This is 
expressed mathematically by the statement that begins ‘given £ > 0’ and you 
have to find an N such that if n > N, then |(a, + bn) - (a+ b)| < £.) Having 
written down ‘Lete > 0’, you must use what you are given to find such 
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an N, which must have the properties required to establish that a, —> a and 
b, — b. Now explain to yourself how this information is used in lines 2 and 
3 and why you use $¢ rather than e. Then go on to reflect on each line in turn 
to see where it comes from. Is it a definition? Is it a new symbol introduced 
to make the argument look simpler? Are there implicit assumptions that are 
easily justified and can be omitted? Is it a deduction from previous lines? If 
so, precisely which lines? 

Do this now, and do it seriously. 

Then read the appendix in this book on How to Read Proofs: The ‘Self- 
Explanation’ Strategy. Research using eye-tracking devices to find out what 
the reader is looking at when they read a proof shows that students using 
self-explanation techniques are more successful in making sense of the proof 
and retaining the ideas over a period of time (see [3]). If you make meaning- 
ful links between ideas, they are more likely to be manipulated easily in the 
mind. If you don’t, then the ideas are more likely to be diffuse and less likely 
to form a basis for making sense in the longer term. 


Examination Questions 


One situation that is of great concern to students is what constitutes a proof 
acceptable in an examination. To a certain extent, the answer depends on the 
examiner, but part of the anxiety is due to uncertainty about the context. In 
a book, the context of a statement is usually clear from its position. A proof 
in chapter 7 is obviously allowed to assume results from chapters 1-6. But in 
an examination it may not be clear at which level a proof is required. Do all 
the steps have to be included? What can safely be missed out? 

If the question is well posed, it will make the context clear. The phrase 
‘show from first principles . . . ° asks for a careful proof from the basic defin- 
itions and axioms. A question on the more advanced parts of a subject will 
not expect this kind of answer, and it is safe to assume any preceding mater- 
ial that is well established as contextual material for that level, never going 
into greater detail than is appropriate for the concepts used in the question. 
In particular, if a question is asked in a manner that makes familiar use of 
certain ideas, then they can be used at the same level of familiarity in the so- 
lution. This avoids long-winded answers that include proofs of elementary 
material that ought to be subsumed into the context. 

Levels may vary within a single question, with the first part being elemen- 
tary and later parts more advanced. The wise student will sensibly increase 
the power of their reasoning to the appropriate context, freely using ideas 
commensurate with the new situation. 
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Exercises 


1. Is the following a proof? If not, why not? Read it through and carefully 
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explain to yourself how each line follows (or fails to follow) from the 
assumptions and the previous lines in the proof. 


Theorem: For all real numbers x, y, 5 (x +y) = XY. 


Proof: Squaring and multiplying through by 4, 
x+ 2xy + y > Axy, 
so subtracting 4xy from each side, 
xX- Ixy +9 > 0. 


But x? - 2xy + y? = (x-y)? which is always > 0, so the theorem is 
proved. 


Is the following a proof? If not, why not? 


Theorem: The base angles of an isosceles triangle are equal. 


Proof: Let AABC be an isosceles triangle with sides AB = AC. Then 
AABC is congruent to AACB because the corresponding sides are 
equal: AB = AC, BC = CB,AC = AB. Here, corresponding angles 
are equal: in particular ABC = ZACB. 

(You may assume the usual geometrical properties of congruent 
triangles.) 


Are the ‘proofs’ given in chapter 2 of this book genuine proofs within a 
suitable context? If so, what is the context? If not, what are the proofs? 


Analyse the proof of proposition 3.10 from chapter 3, showing how 
each statement follows from previous ones. What must be added 
to the proof as written to make it fit the logical definition of a 
proof? 

Repeat the exercise for other proofs from chapter 3. 


Find a mathematics textbook, select a theorem (whose proof is neither 
too long nor too short) and analyse its structure. Which results are 
assumed as contextual background? 

Repeat the exercise for several other theorems, preferably from 
different texts and in different branches of mathematics. 
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6. 


The following are axioms for a (hitherto undefined) mathematical 
structure known as a bureaucracy. This consists of: 


a set B of bureaucrats, 
a set C of committees, 
a relation S between B and C (read serves on), 


satisfying the following axioms: 

(B1) Every bureaucrat serves on at least three different committees. 

(B2) Every committee is served on by at least three different bureau- 
crats. 

(B3) Given two distinct committees, exactly one bureaucrat serves on 
both. 

(B4) Given two distinct bureaucrats, there is exactly one committee 
on which they both serve. 

Prove from these axioms that if the number of bureaucrats is finite, so 

is the number of committees. Prove that there are always at least seven 

bureaucrats in a bureaucracy, and find a bureaucracy with exactly 

seven bureaucrats. 


The following proof fits the logical definition. Analyse it to find out 
what is really going on. 


Theorem: If A, B, C are sets then (AN BA C = AN(BNC). 


Proof: 


: Leta € (ANB)NC. 
:aE€ ANB. 

: Lethe AN(BNC). 
:aEC. 

>: bEBNC. 

:beB. 

: aces. 

> bEC. 

: {a,b} C B. 

:beA. 

:aEAÀ. 

>: bE ANB. 

> ac ANB. 

: {a,b} CAMB. 
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Lis: aE BNC. 

Lis: AE AN(BNC). 

Li: (ANB)NCCAN(BNO. 
Lis: be ANBING 

Lio: (ANB)NCDAN(BNC). 
Læ: (AN B)NC=AN(BNC). 


Rewrite it in a sensible style to reveal the structure of the argument. 
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PART Ill 
The Development 
of Axiomatic Systems 


Now we turn to the number systems themselves, analysing their structure 
and aiming to find a formal list of axioms that will describe them precisely. 
We also show how to construct systems that satisfy these axioms, using the 
raw materials of set theory. This places our intuitive ideas on a firm basis, 
and lets us use them without logical qualms. 

Metaphorically, we are now constructing our building, or growing our 
plant: the important thing is to take as much care as is required to make 
sure that nothing goes wrong. This means a certain amount of attention to 
detail, and the result can look rather tortuous and pedantic. 

The attitude of mind demanded of the reader is now a little different. Al- 
though intuitive ideas may be used as a source of inspiration, nothing may 
be used as part of a proof unless it has been given a rigorous logical dem- 
onstration. It therefore becomes necessary to give rigorous proofs, from the 
axioms, of properties that, on an intuitive level, we already accept. We must 
do this in order to be sure that, in this axiomatic sense, they really are true 
and can be proved logically from the axioms. By doing so, we put our ideas 
on a sound basis. 

In chapter 8 we give highly detailed proofs, even of statements that may 
seem obvious. However, from chapter 9 onwards, having established a rich 
schema of ideas proven rigorously from the axioms of a theory and from 
definitions made within that context, we use proven results from the con- 
text without revisiting the detail, which you could check for yourself, should 
that be necessary. This avoids the danger of losing track of the main out- 
line beneath an accumulation of ever more elaborate detail. The step-by-step 
method, if carried too far, obscures the overall picture. 


CHAPTER 8 


Natural Numbers and Proof 
by Induction 


hat is a number? It took mathematicians a long time to get round 
W: wondering what the answer was, and a lot longer to find one. 
The first step was to characterise natural numbers. It turned out 
that their most important defining feature wasn’t counting, or arithmetic: it 
was the possibility of proving theorems using mathematical induction. But 


at first sight, proof by induction does not seem to fit the pattern of proof 
described in the previous chapter. Look at a typical instance: 


Proposition 8.1: The sum of the first n natural numbers is in(n +1). 


Proof: This is trivially true for n = 1. If itis true for n = k, 


14+24+---+k=1k(k+1), 


=2 
then adding k + 1 to each side we obtain 
1+2+---+(k+1)= $k(k+1)+(k+1) = 5 (k + 1)(k +2). 


This is the sum of the first k + 1 natural numbers, and the formula is true for 
n = k +1. By induction, the formula is true for all natural numbers. 


Many people regard this type of proof as an ‘and so on ...’ sort of ar- 
gument. The truth of the statement is established for n=1; then, having 
established the general step from n = k ton = k + 1, this is applied for k = 1 
to get us from n = 1 ton = 2, then used again to go from n = 2 ton = 3, 
and so on, as far as we wish to go. For instance, we could reach n = 593 after 
592 applications of the general step. The only trouble with thinking this way 
is that reaching large values of n requires a large number of applications of 
the general step. We can never actually cover all natural numbers in a finite 
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number of deductions if we proceed one number at a time. But a proof, by 
definition, comprises a finite number of lines.’ 

The way out of this dilemma is to remove the ‘and so on...’ part from the 
proof and place it squarely in the definition of the natural numbers. Proof by 
induction then fits naturally into the type of mathematical proof described 
in the last chapter. 


Natural Numbers 


The natural numbers form a highly non-trivial set, because we cannot write 
down a complete list of elements: they go on forever. Describing them 
satisfactorily needs a different approach. Fortunately, the intuitive idea of 
counting can easily be modelled in a set-theoretic way. We begin with 1, 
then comes 2, then 3, and we carry on in this way, naming each successive 
number as far as we wish. 

To grasp the concept of the set of natural numbers “all in one go’, we regard 
this succession as a function on the set N of natural numbers. That is, we seek 
a function s : N — N with suitable properties. Here s stands for ‘successor 
and s(1) = 2, s(2) = 3, andso on. Two obvious properties that we need are: 


(i) sis not surjective (because s(n) # 1 for any n € N), 
(ii) sis injective (s(m) = s(n) implies that m = n). 


There is a third vital property, giving rise to induction proofs, as follows: 


(iii) Suppose that S C N is such that 1 € S; and forall n € N ifn € S then 
s(n) € S. Then S = N. 


In words, (iii) says that a subset containing 1, which includes s(n) when- 
ever it contains n, exhausts the whole set of natural numbers. 

Surprisingly, these three properties are all that are required to describe the 
natural numbers. An axiomatic basis for arithmetic requires only that we 
postulate the existence of a set with these three properties. 

For technical reasons, it is more profitable to start with 0 rather than 1. 
Although in counting we usually start with 1, the empty set has 0 elements 
and it is useful to be able to say so. Again, in arithmetic it is convenient to 
have the zero element. For these and other reasons we start with 0 in our 
axiomatic system, and to avoid confusion with our intuitive concept N of 
the natural numbers we use No to denote the formal system. The ‘black- 
board bold’ font N distinguishes the formal concept of the natural numbers 
from the informal one, and the subscript 0 reminds us that 0 is included. 


1 Textbooks would become very expensive if not, for a start. 
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We then obtain the Peano Axioms for the natural numbers, named after Gi- 
useppe Peano, the Italian mathematician responsible for this approach at the 
end of the nineteenth century: 


Peano Axioms: Suppose that there exists a set Nọ and a function 
s : No — No such that 


(N1) s is not surjective: there exists 0 € No such that s(n) # 0 for any 
n e€ No. 

(N2) sis injective: if s(m) = s(n) then m = n. 

(N3) IfS C No is such that 0 € Sand n € S > s(n) € S forall n € No, 
then S = No. 


There is no guarantee that any such set No exists, so we take its existence 
as a basic axiom for mathematics: 


Existence Axiom for Natural Numbers: There exists a set Nọ and a 
function s : No —> No satisfying (N1)- (N3). 


From these slender beginnings we can develop all the usual properties of 
arithmetic, then later build up the other number systems including real and 
complex numbers. We will also see how axiom (N3) enshrines the idea of 
proof by induction, as in the following simple case: 


Proposition 8.2: Ifn € No, n #0, then there exists a unique m € No such 
that n = s(m). 

Proof: Let S = {n € No|n = 0 orn = s(m) for some mENo}. Certainly 
0 € S. Ifn € S then either n= 0, in which case s(n) = s(0) so s(n) € S; or 
n = s(m) and s(n) = s(s(m)) where s(m) € No, so s(n) € S. Hence, by axiom 
(N3), S = No. This shows that the required m exists. Uniqueness follows 
from (N2). 


Proposition 8.2 tells us that 0 is the only element that is not a successor, a 
property that distinguishes it from all other elements. The set N = No\ {0} 
will be called the natural numbers. We shall denote s(0) by 1. This element 
lies in N and will prove of paramount importance. 

Look at the proof of proposition 8.2 once more. Its essential structure 
consists of defining a set S, then 


(i) showing that 0 € S, 
(ii) showing that n € S > s(n) € S, 
(iii) invoking axiom (N3) to deduce that S = No. 
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A proof by induction always has this format. 
In practice S is of the form 


S= {n € No | P(n)} 


where P(n) is a predicate known to be true or false for each n € No. The 
statements (i), (ii), (iii) translate into 


(i) showing P(0) is true, 
(i) showing that if P(n) is true then P(s(n)) is true, 
(ii) invoking (N3) to deduce that P(n) is true for all n € No. 


Axiom (N3) finishes the proof without a breath of an ‘and so on...’ type of 
argument. 

The reader will recognise the basic skeleton of this method in propos- 
ition 8.1, except that we began at 1 instead of 0 and wrote n+ 1 instead of s(n). 
Later we show that the same method applies starting at any k € No, in par- 
ticular at k = 1, so the proposition at the beginning of the chapter is just a 
simple example of an induction proof depending on the use of axiom (N3). 

In practice, axiom (N3) may not be mentioned explicitly. The proof may be 
phrased entirely in terms of a predicate P(n), and, when steps (i)’ and (iiy are 
established, the conclusion ‘P(n) is true for all w is said to be established ‘by 
induction’. You should always interpret this as an implicit use of axiom (N3), 
which is referred to as the induction axiom for this very reason. During the 
course of such a proof, the assumption that P(n) is true is called the induction 
hypothesis and the proof that P(n) = P(s(n)) is called the induction step. For 
the moment, we make the set S explicit. 


Definition by Induction 


The most important task is to set up arithmetic in No. To get started, we must 
define the basic operations of addition and multiplication. 
We can define addition by setting 
m+0=m (8.1) 
for allm € No, and then, once we have calculated m + n, we can calculate 
m + s(n) by 
m + s(n) =s(m+n). (8.2) 
The induction axiom seems tailor-made for definitions, as well as proofs. If 
S is the set n € No for which m + n is defined, then 0 € S (by (8.1)), and, if 


n € Sthen m+n is defined and by (8.2) we can use s(m + n) to define m+ s(n) 
so that s(n) € S. 
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However, there is a subtle point here, which involves a difference be- 
tween proof and definition. In an induction proof, the induction step 
néS = s(n) € S involves only a demonstration that if n € S is true then 
so is s(n) € S. But when making an inductive definition of addition, in order 
to be able to define the sum m + s(n) as s(m + n), it is essential first to know 
the value of m + n. 

Our intuitive model Np tells us that for any n € No we can start at 0, count 
on 1, 2, 3,..., and eventually reach n. For instance, if n = 101 we can start 
from the definition (8.1) at 0 and use step (8.2) 101 times to find m + n. 
Unfortunately we have not established any such principle for No; indeed, 
given m € No, we don’t yet know that if we start with 0 and form successors 
1 = s(0), 2 = s(1), and so on, we eventually reach m. Moreover, our long- 
term objective is to eliminate ‘and so on...’ arguments. To remove this flaw 
we prove a general principle about the validity of such definitions based only 
on the Peano axioms. It helps to formulate the theorem in the general case of 
the repeated composition of any function f in any set X and then apply it to 
the successor function s to make definitions by recursion. The proof is quite 
technical (probably one of the most intricate in the whole book). It may help 
to use the self-explanation technique to slowly consider each step to seek to 
explain it to yourself. 


Theorem 8.3 (Recursion Theorem): If X is a set, f : X > X a func- 
tion, and ce€X, then there exists a unique function ¢ : Nọ— X such 
that 


(i) 40) =c, 
Gi) P(s(n)) = f(¢(n)) for all n € No. 


PrE-PrOoF Discussion. Essentially, we start with a function f : X —> X and 
c € X and apply f again and again to get 


$(0) = c @(1) = f(c) = f(c), (2) = f) =f (F(c)), and so on, 


to give the function (n) = f”(n) (where we can consider f°(c) to be c). To 
eliminate the ‘and so on’ argument, we use the set-theoretic definition of a 
function ¢ : Nọ > X asa set of ordered pairs and consider those subsets U 
of No x X satisfying 


(a) (0,c) € ¢, 
(b) (ny) € $ > (s(n), f(®)) € ¢. 


There are many such subsets, including the whole set U = No x X. We show 
that the one we require is the intersection of all such subsets. 
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Proof: Let ¢ be the intersection of all subsets U of No x X satisfying 


(0,c) € U, (8.3) 
(n,c) € U > (s(n), f(x)) € U. (8.4) 


Let 
S = {n € No | (n, x) € ¢ for some x € X}. 


Then 0 € S by (8.3). And by (8.4), n € S > s(n) € S. By induction, S = No. 

So every n € N does have some x such that (n, x) € ¢@ for some x € S. 
However, to show that ¢ is a function, we also need to prove that x is unique. 
Let 


T = {n € No | (n, x) € ¢ fora unique x € X}. 


We seek to prove that T = No by induction. 

Starting with n = 0, we know that (0, c) € @. If also (0, d) € ọ with c #d, let 
gb = \{(0, d)}. Then ¢ satisfies (8.3); and if (n, x) € @ then (s(n), f(x)) € $ 
and is not (0, d) because s(n) # 0 by axiom (N1). So (s(n), f(x)) € @ and @ 
satisfies (8.4). Since @ is the smallest set satisfying (8.3) and (8.4) this is a 
contradiction, hence no such d exists, so 0 € T. 

The induction step that n € T implies s(n) € T uses a similar argument, 
as follows. 

Ifn € T then (n, x) € ¢ for precisely one x € X. From (b) in the pre-proof 
discussion we have (s(n), f(x)) € @, so to establish that s(n) € T we must 
show that no other ordered pair (s(n), y) € @ with y # f(x). If there were 
such a pair, consider ġ* = $\{s(n), y)}. Again, since 0 # s(n), we know that 
ġ* satisfies (8.3). 

To check (8.4) we need to prove that 


(m, z) € o* = (s(m), f(z)) € & for all m € No. 


This is true for m = n, since there is a unique x € X such that (n,x) € @, 

and for this x, (s(n)), f(x)) € @ by (b) and is not (s(n), y) since y # f(x). 

For m # n we have (s(m), f(z)) € @ by (b), and s(m) # s(n) by (N2). Hence 

(s(m), f(z)) # (s(n), y), so (s(m), f(z)) € *. Either way, * satisfies (8.4) and 

again we have a contradiction. 
By induction, T = No. 


Definitions that employ this theorem are said to be recursive. The recur- 
sion theorem opens the floodgates to give a wide range of examples. These 
include: 
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(1) Addition. am : No > No, &m(n) = m + n, defined by 
a,(0) =m 
Am(s(1)) = s(%m(n)). 


Here c = m,f =s. 
(2) Multiplication. lm : No —> No, Hm(n) = mn, defined by 


Um(0)=0 
HUmls(n)) = fm(n) + m. 


Here c = 0, f(r) =r +m. 
(3) Powers. Tm : No —> No, 1m(n) = m”, defined by 


(0) = 1 
Tm(s(n)) = mIm(n). 
Here c = 1, f(r) = rm. 
(4) Repeated composition of a map f : X —> X, defined by 
P@=x 
f(x) =f) forall x € X. 


Laws of Arithmetic 


With addition and multiplication properly defined by recursion, it is now 
relatively easy to prove the usual laws of arithmetic using induction. The 
proofs are not always easy to find without guidance, and you are encouraged 
to follow through the arguments and explain the proof to yourself. By build- 
ing up your own knowledge schemas, you may be able to find slicker proofs 
than ours. 

For reference, we note the definitions: 


(al)m+0=m, (a@2)m+s(n) =s(m+n), 
(41) m0 = 0, (u2) ms(n) = mn +m. 


Now from (@2) and (a1) we see that m + s(0) = s(m + 0) = s(m). We have 
already denoted s(0) by 1, so s(m) =m +1. 


Lemma 8.4: For all m € No, 


(a) O+m=™m, 
(b) 1+m=s(m), 
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(c) Om = 0, 
(d) 1m =m. 


Proof: In each case, use induction on m. We verify (a) and leave the rest as 


an exercise for you to explain for yourself. Let 
S= {m € No|0+m = m}. 
Trivially 0 € S by (a1). If m € S then 0 + m = m, so by (a2), 
0 + s(m) = s(0 + m) = s(m). 
Therefore s(m) € S. 


By (N3), S= No. 


Theorem 8.5: For all m, n, p € No, 


(a) (m+n)+p=m+(n+p) 

(b) m+n=n+m 

(c) (mn)p = m(np) 

(d) mn = nm 

(e) m(n +p) = mn + mp. 
Proof: (a) is proved by induction on p, using 

S={peNo|(m+n)+p=m+ (n+p). 

First 


(m+n)+0=m+n by (a2) 
=m+(n+0) by(al), 


so 0 € S. Second, if p € S then 
(m+n)+p=m+(n+p), 
so 


(m+n) +s(p) =s((m+n)+p) by (a2) 
=s(m+(n+p)) by (8.5) 
=m+s(n+p) by (a2) 
=m+(n+s(p)) by (a2) 


implying s(p) € S. By induction, S = No. 
(b) is proved by induction on n using 


S={nENo|mtn=n+m}. 
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(8.5) 


Lemma 8.4(a) shows that 0 € S. If n € S then 


m+n=n+m 


and then 


m+ s(n) = s(m+n) 
= s(n +m) 
= n+s(m) 
=n+(1+m) 
=(n+1)+m 


(8.6) 


by (@2) 

by (8.6) 

by (@2) 

by lemma 8.4(b) 
by theorem 8.5(a) 


= s(n) +m, 


hence s(n) € S. By induction S = No, establishing (b). 
It is convenient to deal with (e) next, using induction on p. Let 


S= {p € No|m(n+ p) = mn + mp}. 


Then 


m(n+0) = mn 
=mn+0 
= mn + m0 


by (a1) 
by (a1) 
by(u1), 


implying 0 € S. If p € S, then 


m(n + p) = mn + mp (8.7) 


so 
m(n + s(p)) = ms(n + p) by (a2) 
=m(n+p)+m by (u2) 
=(mn+mp)+m by(8.7) 
=mn+(mp+m) by theorem 8.5(a) 
= mn + ms(p) by (u2). 
Therefore s(p) € S, and induction gives S = No. 


The proof of (c) is now relatively straightforward and of the same nature 
as previous proofs. This leaves (d), which turns out to be a little trickier. Let 


S = {n € No|mn = nm}. 


Now 0 € S by lemma 8.4(c). If n € S then 


mn = nm (8.8) 
And 
ms(n)=mn+m by(u2) 
=nm+m _ by(8.8). 
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If we could show that this equalled s(n)m we would have finished, but un- 
fortunately we don’t know this yet. However, we can prove it by a second 
induction on m. Let 


T = {m € No|nm + m = s(n)m}. 
Then 0 € T, and if m € T then 
nm +m = s(n)m (8.9) 
So 


n(m+1)+(m+1) 
(nm+n)+(m+1)__ by(e) 
nm+(n+(m+1))  by(a) 
nm+((n+m)+1) _ by(a) 
nm+((m+n)+1) by(b) 
nm+(m+(n+1)) by(a) 
(nm+m)+(n+1) by(a) 

= s(n)m + s(n) by (8.9) 
= s(n)s(m) by (22). 


Hence s(m) € T and T = No. Returning to where we left off, s(n) € S and 
S = No. This proves (d). 


ns(m) + s(m) 


Il 


Il 


Il 


Having performed this massive induction exercise we can now use these 
arithmetic results freely to provide a coherent context in which we can prove 
more sophisticated ideas without overburdening the proof with too much 
detail. To simplify notation and make it look more familiar, we replace s(n) 
byn + 1. The induction axiom now becomes: 


IfS c No, 0 € S,andneS > n+1€ 5S, thenS = No. 
Axiom (N2) translates into 
m+l=n+1> me=n, 


and this can be extended by induction to give: 


Proposition 8.6: For all m, n, q € No, 


(a) m+q=n+q4 > m=n 
(b) q4 #0, mq = nq > m=n. 


Proof: (a) Use induction on q. Let 


S={qeNo|m+qzentq > m=n}. 
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Trivially 0 € S. If q € S, suppose that 

m+(q+1)=n+(q+1). 
By theorem 8.5(a), 

(m+q)+1=(n+q) +1, 
hence by (N2) 

m+q=n+q, 
and since q € S, 
m=n. 


Hence q + 1 € Sand by induction S = No. 
(b) Let 


S= {m € No|4 #0, mq = ng > m=n}. 
To show 0 € S, suppose that q # 0, and 
nq = 0q = 0. 


Then q = p + 1 for some p. If n #0 then n = r + 1. Then ng = (pr+p+r)+1 
so cannot be 0. Hence n = 0, so 0 € S. 
Now suppose m € Sand q # 0, with 


(m + 1)q = nq. 


As before, n # 0, so n =r + 1 for some r € No. Then mq + q = rq + q. By part 
(a), mq = rq; by hypothesis m = r. Therefore m + 1 = n. 


We now discuss subtraction. Suppose that p = r + q. By proposition 8.6, r 
is determined uniquely by p and q. We may therefore denote r by p - q. For 
m, n € No we define a relation > by 


m>nse J3 eceNọo m=r+n. 


Given m, n € No, the difference m - n is defined only when m > n. This 
being so, we can verify various rules of subtraction, such as 


m-(n-r)=(m-n)+r form>n>r, 
mt+(n-r)=(m+n)-r forn >r, 
m(n-r)=mn-mr forn >r. 


All are routine; for instance the last follows by considering 


n=s+r (sincen > r), 
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whence 
mn = m(s+r) = ms + mr. 


Thus by definition, 


mn -mr = ms = m(n -r) 


sinces=n-r. 
We may also consider division, and in the case m = rn (n # 0) we denote r 
by m/n. We discuss when division is possible in a later section. 


Ordering the Natural Numbers 


We have already defined the relation > on No. The other order relations are 
given by 

m>n&m>n&mz#žn, 

m<neon>m, 

m<nen>m. 


We must prove that these are indeed order relations in the sense of chapter 4. 
For example: 


Proposition 8.7: m > n,n > p= m > p forall m, n, p € No. 


Proof: There exist r, s € No such that m = r+ n,n = s + p. Hence m = 
r+(s+p)= (r+ s)+p,som >p. 


A second property of order relations is also easy: 


Proposition 8.8: If m,n € No and m > n,n > m, then m = n. 


Proof: There exist r, t € No such that m = r +n,n =t+m,som=r+t+m. 
By proposition 8.5(a), r +t = 0. We cannot have t # 0, since this would 
imply t = q + 1 for some q € N, by lemma 8.4, and then 0 = (r+q) +1, 
contradicting axiom (N1). Therefore t = 0, so n = m. 


The third property of an order relation requires a more technical proof, 
which is postponed until proposition 8.13. However, it is a simple mat- 
ter to see that the relations behave as expected, relative to the arithmetical 
operations of No: 


Proposition 8.9: For all m, n, p, q € No, 


(a) ifm 
(b) ifm 


n,p > q,thenm+p>n+q, 


> 
> n,p > q, then mp > nq. 
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Proof: (a) There exist r,s € No such that m = r + n, p = s + q. Hence, after 
simplification, we find that m+ p = (r +s) + (n + q). 
(b) Similarly, mp = nq + (rs + ns + rq). 


The zero element 0 is the smallest element of No, in the following sense: 


Lemma 8.10: Ifm € No then m > 0. 


Proof: m = 0+ m. 


The element 1 is the next smallest: 


Lemma 8.11: Ifm € No and m > 0 then m > 1. 


Proof: By proposition 8.1, ifm # 0 then m = q + 1 for some q € No. Hence 
m> l. 


We could go on to show that 2 = 1 + 1 is the next smallest after 1, then 
3 = 2 + 1 is the next smallest after 2, and so on. It is more efficient to prove a 
general proposition: 


Proposition 8.12: Ifm,n € No and m > nthenm>n+1. 
Proof: We have m = n + r for somer € No, andr # 0 since m ¥ n. By 
proposition 8.2, r = q + 1 for some q € No, hence m = (n + 1) + q, and 
m>n+l. 


Now we can complete the proof that > is an order relation in the sense of 
chapter 4. 


Proposition 8.13: The relation > is a (weak) order relation on No. 
Proof: We must prove that for all m, n, p € No, 


(WO1) m> nę&gn>p=>m>p, 
(WO2) Either m > norn > m, 
(WO3) Ifm > nandn > m then m= n. 


We have already established (WO1) and (W03) in propositions 8.7 and 8.8. 
To verify (WO2), let 


S(m) = {n € No|m > norn > m}. 


We aim to prove that S(m) = No for all m € No. Now for a given m, we have 
0 € S(m) since m > 0. Next suppose that n € S(m). Either m > norn > m. 
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Ifn > mthenn+1 > m.Ifn > m then either n = m, andm < n+1,or 
m > n, in which case m > n + 1 by proposition 8.12. Thus n + 1 € S(m). By 
induction, S(m) = No. 


As remarked in chapter 4, it follows that > is a strict order relation. That 
is, for all m,n, p € No, 


m>nk&n>p>m> p. 


Exactly one of m > n, m=n, m <n is true (trichotomy law). The next result 
is almost a converse to proposition 8.6. 


Proposition 8.14: For all m,n, p,q € No, 


(a) m+q>n+q>m>n, 
(b) q #0,mq >nq>m>n. 


Proof: (a) If m % n then m < n by trichotomy. But m < n implies m + q < 
n + q by proposition 8.9(a). This contradicts the hypothesis, so part (a) is 
proved. Part (b) follows a similar format. 


Proposition 8.14 is of course valid when > is replaced by >, and is then an 
exact converse to proposition 8.6. 


Uniqueness of No 


The set No, its arithmetic, and order are essentially unique in a very precise 
sense. As a down-to-earth illustration, the French counting system ‘un, deux, 
trois, ...’, while undeniably different from the English ‘one, two, three, ...’, 
possesses the same arithmetical structure. To see this, we observe that trans- 
lating from French to English by replacing ‘un’, by ‘one’, ‘deux’ by ‘two’, 
and so on, turns valid French arithmetic into valid English arithmetic, and 
conversely. It is the same with No. 

Suppose that we can find another set Nj with a function s’ : Nj —> Nj sat- 
isfying the corresponding axioms (N’1)-(N’3). Then we define ¢ : No > Nọ 
by 


$0) =0' 
(s(n) = s'((n)) 


for all n € No. This function exists by the recursion theorem, as does 
e : Nọ — No given by 
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o0) =0 
gp(s'(m)) = s(y(m)) 


for all m € No. A simple induction proof now shows that ¢ and ¢ are mutual 
inverses. Let S = {n € No|g@(n) = n} to show that y@ = 1y,, and similarly 
prove that ¢g = ly, . Induction on n also shows that 


(m +n) = (m) + $(n) 
(mn) = o(m)o(n) 


and 
m > n > b(m) => o(n). 


Thus the bijection @ between No and Nj preserves the arithmetic and order: 
we can use it to ‘translate’ valid results in one into valid results in the other. 

Such a bijection is called an order isomorphism. The word ‘isomorphism’ 
alone is normally used for a bijection that preserves all relevant arithmet- 
ical (algebraic) operations. The word ‘order’ is used to emphasise that the 
ordering is also preserved. This usage extends to a variety of mathematical 
systems. 

In this sense there is only one possible structure for a system satisfying 
(N1)-(N3): all such systems are order isomorphic. The whole ethos of the 
natural numbers is encapsulated in three simple axioms. 

One system that we expect to satisfy these axioms is our intuitive concept 
N U {0}, so this ought to correspond in the obvious way to No. The vital 
difference is that the properties we expect of N U {0} have been built up by 
example and experience, whereas those of No have been deduced logically 
from the axioms. Thus all of the usual properties that we expect of N U {0} 
can be given a rigorous justification in No. We could, for example, name the 
elements of No using decimal notation and calculate addition and multipli- 
cation tables. At this stage it is more profitable to omit such technicalities on 
the understanding that they are routine. 


Counting 


As in real life, we can count using natural numbers. Let 
N(n) = {me N|1<m<n} 
forn € N, and let 


N(0) = Ø. 
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A set X is said to have n elements (n € No) if there is a bijection 


f: Nn) > X. 
This models the primitive idea of counting. If we point to the elements f(1), 
f(2),..., f(n) in turn and call out ‘1, 2,..., w then this is precisely how we 
count. 


Fig. 8.1 Counting 


The useful notational device N(0) = © lets us apply the process to the 
empty set as well. If a set has n elements for some n € No then it is said to be 
finite; otherwise it is infinite. 

This manner of counting does not depend on the order in which we count 
the elements of the set. That is, given a bijection f : N(n) > X and a bijec- 
tion g:N(m) > X, we always have m = n. To see this, let p=f~!g. Then 
gy:N(m) > N(n) is a bijection. We prove by induction that if there is a 
bijection between N(n) and N(m), then m = n. 

This is certainly true for m = 0. Suppose it is true for some m € No, and 
consider a bijection 


6:N(m+1) > N(k). 


Now k # 0, or else m + 1 = 0 which contradicts (N1). Hence k = n + 1 for 
some n € No. We now construct a bijection 6* : N(m + 1) > N(n + 1) for 
which 6*(m + 1) = n + 1. If it is already the case that 0(m + 1) = n+ 1 then 
we take 6* = 0. If not then 6(q) = n + 1 for some q < n, and we define 

6*(q) = O(m +1) 

O*(m+1)=n4+1 

0*(r) = 0 (r) otherwise. 


Restrict 0* to a map 


0* | Nom) H N(m) => N(n). 
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This is clearly a bijection, so by induction m = n. Hencem+1=n+1=k, 
completing the induction step. 
This validates the intuitive idea of counting within the formal system. 


Von Neumann’s Brainwave 


As a diversion, we now mention John von Neumann’s brilliant method of 
describing natural numbers, announced in 1923. It is particularly suitable for 
counting, the number n being defined as a specific set with n elements. 

To start, there is only one choice for a set with 0 elements, so we put 


0, = 2. 


(Here the suffix v stands for von Neumann.) We now have one object, 
namely 0,, so we define 


ly = {0}, 


manifestly a set with 1 element. Now we have two objects 0, and 1,, so we 
define 


2y = {0y, ly}. 
It is now clear how to continue. Note that 


{0,, 1,} = {0,} U {1y} =1,U {ly}. 


Having described 
ny = {0y, 1y,...,(n - 1)y} 
we define 
(n+ 1)y = ny U {ny} 

= {0...3 (7 = 1)v} U {7v} 

= {0,,..., ny}. 
This procedure can be made more formal as follows. For any set X we let 

o(X)= XU {X} 


be the successor of X. This has the bizarre property 
X €0(X) and X C o(X). 
Now a set Q whose elements are sets is called inductive if 


ØER 
XENSo(Xye€Q. 
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To avoid an ‘and so on...’ definition, von Neumann postulated: 
Axiom of Infinity: There exists an inductive set Q. 


This set Q may be bigger than we require. But if we let N, be the inter- 
section of all inductive subsets of Q, then it is the smallest inductive subset. 
Hence if S C N, and S is inductive, it follows that S = N,. 

Since N, is inductive, we have Ø € N,, and XEN, > o (X) € N, so 
o : Ny > N, is a function. Also Ø ž o(n) for any n € N,, since n € o(n). 
We shall prove ø is injective. 

First note that if m, n € N, and m € n, then m C n. For let 


S={neN|men> mcn} 


Trivially @ € S. Suppose n € Sand m € o (n). Then either m € n or m = n. 
In either case m C nU {n} = o (n). Hence S is an inductive subset of N,, so 
S=N,. 

Now suppose that o (m) = o (n) . Then m U {m} = n U {n}. Thus m € nU {n} 
and either m € n or m = n. By the above remark, m C n. Similarly n C m, 
hence m = n and ø is injective. 

Gathering together these remarks, we find that N, is a set, o : Ny > N, is 
a function, Ø € N,, and 


(i) @ #o(n) for any n € Ny, 
(ii) o(m)=0o(n) > m=n, 
(iii) if S CN, Ø € S, and n € S > a(n) € S, then S = Ny. 


These are the same as the Peano axioms, with N, in place of N, ø in place of 
s, and Ø in place of 0. So von Neumann’s idea gives an alternative foundation 
for the natural numbers, and his axiom of infinity acts as a substitute for the 
existence axiom for the natural numbers. We could have used this approach 
instead. However, the simplest way to count in von Neumann’s system is to 
say that a set X has n elements if there is a bijection f : ny —> X, that is, 


f :{0y,1y,...5(n-1)y} > X. 


This corresponds to counting ‘Oy, 1,,...,(a - 1), rather than the more 
primitive ‘1, 2, 3,..., ” to which we are accustomed. 


Other Forms of Induction 


Sometimes the induction step in a proof by induction needs more than the 
assumption that P(n) is true in order to deduce P(n + 1). For example, we 
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may need to know the truth of P(1), P(2), . . . , P(n) before being able to pass 
to P(n + 1). This situation is governed by the so-called General Principle of 
Induction. If 


(GP1) P(0) is true, 
(GP2) the truth of P(m) for all m € No with m < n implies the truth of 
P(n+1), 


then P(n) is true for all n € No. 

At first sight this seems to be a genuine extension of the induction prin- 
ciple, because the second statement seems to use more information. But if we 
let Q(n) be the predicate 


P(O) & P(1)&... & P(n), 
or more formally, 
‘for allm € No, m < n, P(m) is true’, 
then we find that (GP1) and (GP2) become 


(i) Q(0) is true, 
(ii) the truth of Q(n) implies the truth of Q(n + 1). 


Thus the disguise of the ‘general’ principle is exposed: it is just the ordinary 
principle for Q(n), and in theory it is no more general than the usual principle 
of induction. In practice, of course, it sometimes leads to simpler proofs. 
With it we can prove a highly useful variant of the induction principle. First, 
we say that a set S has a least element a ifa € Sanda < s for all s € S. Then 
we can state: 


Theorem 8.15 (Well Ordering Principle): Every non-empty subset 
S C No has a least element. 


Proof: We have to show that if Ø # S C No then there exists a € S such that 
for all s € S we have a < s. For a contradiction, suppose no such a exists. Let 
P(n) be the predicate n € S. Then P(0) is true, for 0 € S would imply that 
0 is the least element of S by lemma 8.10. Now suppose that P(m) is true for 
allm < n, sothatifm < n then m ¢ S.Ifs € Sthens > n,sos>nt+1 
by proposition 8.12. We could not have n + 1 € S since it would then be a 
least element, so n +1 ¢ S and P(n + 1) is true. By the general principle of 
induction P(n) is true for all n, that is, S is empty. This is a contradiction. 


Another variation of the induction principle starts not at 0 but at some 
other k € No. If 
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P(k) is true, and 
the truth of P(m) for m > k implies the truth of P(m + 1), 
then we may deduce that P(n) is true for all n > k. 
This reduces to the usual induction principle on putting Q (n) = P(n + k). 


Most often we meet this with k = 1. But in the next proposition, which we 
shall need elsewhere, we require k = 3. 


Proposition 8.16 (General Associative Law): If a),...,a, € No, then 
the sum a; + --- + a, takes the same value independently of the manner in 
which brackets are inserted. 


Proof: If n = 3, there are only two methods of bracketing, namely (a; + a) + 
az and a; + (a2 + a3). These are equal by theorem 8.5(a). Suppose the prop- 
osition is true for some n. Then without ambiguity we may omit all brackets 
from a sum of n or fewer numbers. We must therefore consider 


(ay ++ + + Ak) + (Akr +*+ + Ont) 
and show that the value of this is independent of k. Let 


A=a,+-++ +a 
b = api +: + an 
C = Anyi. 

Then the expression is equal to 


a+(b+c) =(a+b)+c 
= (a, +-:- +an) + an1 


which does not depend on k. This completes the induction step. 

A similar proof works when addition is replaced by multiplication. 
Division 
Given m,n € No with n # 0, it is not always possible to divide n into m and 
obtain a solution in No. For this to happen m must be a multiple of n, that is 
m = qn for some q € No. If it does not happen, then the division process will 


yield a remainder. 


Theorem 8.17 (Division Algorithm): Given m, ne No with n 0, there 
exist unique elements q, r € No such that m = qn + randr < n. 
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Proof: Use induction on m. Let 
S={meNo|m=qn+rfor gq, re No, r <n}. 


Since 0 = 0n + 0, we have 0 € S. Suppose m € S. Then m = qn+rwithr < n, 
and 


m+l=qn+r+l1. (8.10) 
Nowr < nimplies r + 1 < n. So either r + 1 = n, when (8.10) becomes 
m+1=(q+1)n+0, 
or r+ 1 < n, when (8.10) becomes 
m+1=qn+(r+1) withr+1 <n. 


In either case, m + 1 € S, so by induction S = No. 
To show that q, r are unique, suppose that 


m=qn+r=qdnt+r 


where 1, 7’ < n. Then 


qn <m<(q+1)n 
qn<m<(q+1)n. 


Hence, by transitivity of the order relation, gn < (q' + 1)n, so by propos- 
ition 8.13, q < q' + 1. Then proposition 8.12 implies that q < q’. Similarly 
q < q, so q = q’. Proposition 8.6(a) now implies that r = r’. 


Factorisation 


We can now discuss factorisation into primes, and in particular prove 
uniqueness. Only non-zero numbers are of interest, so for the remainder of 
this chapter we work in N = No\ {0}. First some straightforward definitions 
are required. 

We say that k € N is a factor or divisor of m €N if there exists s € N such 
that m= ks. We write k | m. Trivially 1 and m are factors of m; any other fac- 
tor is called a proper factor. We call m prime ifm # 1 and m has no proper 
factors. (We exclude 1 for convenience, for example in the unique factorisa- 
tion theorem which follows.) It is easily seen that a factor k of m must lie in 
the range 1 < k < m, for ifk > m then since s > 1 we find that ks > m. A 
proper factor therefore lies in the range 1 < k < m. 

If k is a factor of two numbers m, n € N, it is called a common factor. Now 
1 is always a common factor; if it is the only one we say that m and n are 
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coprime. Rather than characterising the highest common factor as the largest 
of the common factors (which indeed it is) we choose to define it in a more 
useful way. 


Definition 8.18: We say that h € N is the highest common factor of m,n € 
N if h is a common factor with the property that any other common factor k 
must be a factor of h. We write 


h = hcf (m,n). 


The Euclidean Algorithm 


The simplest way to prove that any two non-zero natural numbers have a 
highest common factor is to calculate it explicitly. There is a method for do- 
ing this, called the Euclidean algorithm for historical reasons, which depends 
on the following two facts: 


(i) Ifri = qir2 then r = hef (r1, r2). 
(ii) Ifri = qır2 + r3 with r3 # 0, then hcf (r1, r2) = hef (r2, r3). 


The proofs are easy exercises using the definition of hcf, and in particular (ii) 
is true since the equation rı = qır2 + r3 shows that any common factor of 
rı and rz must also divide r3, and any common factor of r, and r3 must also 
divide rj. 

To find the hef of rı and r2, use the division algorithm repeatedly to find 
qi ri Such that 


ri = qın +13 (r3 < r2) 
12 = q2ů3 + 14 (r4 < r3) 


Ti = qfi triz (faz < Ti) 


Since r2 > r3 > r4 > ... the process cannot continue indefinitely, for the 
well-ordering principle tells us that the set of numbers concerned has a least 
element. Therefore at some stage ri2 = 0, ris; # 0. This value of r; is a 
highest common factor for rı and r2. This is a consequence of statements (i) 
and (ii) above, which show that 


hef (r1, r2) = hef (12,73) = +++ = hef (rj, rin) = fin 


As an example, we find the hcf of 612 and 221 (allowing the usual operations 
of arithmetic as part of our contextual technique, since we have seen that 
they may be formalised within No): 
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612 =2 x 221+ 170 
221 =1x170+51 
170 =3 x 51+17 

51 =3 x17 


Hence hcf(612, 221) = 17. 
Note that this method yields the hcf without factorising the numbers into 
primes, unlike the method often taught in schools. 


Proposition 8.19: Ifh is the hcf of rı, r2 € N, and n € N, then the hcf of 
nr, and nr is nh. 


Proof: If we take the steps in the Euclidean algorithm for hcf (rı, 12), 
as written out above, and multiply through by n, we obtain a system of 
equations 


nri = qın +nr3 (nr3 < nr) 
Nr, = qonr3+ nrg (nrg < nr3) 


NY; = qif iyı (recalling that r2 = 0) i 


Uniqueness of the remainder at each stage implies that this is the Euclidean 
algorithm for hcf (nr, nr2) , so the result is 


nri =n x hcf (ri, r2). 


From this follows a crucial result: 


Lemma 8.20: If m,n € N and p is a prime dividing mn, then either p 
divides m or p divides n. 


Proof: Suppose that p does not divide m. Since p is prime, its only factors 
are 1, p; so the hcf of p and m must be 1. By proposition 8.19 the hcf of nm 
and np is n. But p divides nm and np, so the definition of hcf implies that p 
divides n. 


Corollary 8.21: Ifm,...,m, € N and a prime p divides mı, . . . , m,, then 
p divides at least one of mı, . . . , my. 


Proof: Use induction on r > 2. 


The final theorem of this chapter states, in formal terms, that the factor- 
isation of a natural number into primes is unique, except for the possibility 
of writing the factors in a different order. 
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Theorem 8.22 (Uniqueness of Prime Factorisation): Suppose that 
m E€ N,m > 2,and 


m=p? ph = qf -af 


for primes p;, qj and natural numbers e; fi > 1. Then r = s, and there is a 
bijection 


gy: {1,... r} — {1,..., s} such that p; = qoi) and e; = fya) for each i. 


Proof: Use induction on k = e; +---+e,. Ifk = 1 then m = pir = 1, & = 1. 
Now p; divides the product of the q;’s; hence, by corollary 8.21, p, divides q; 
for some i. Since q; is prime, p = q;. Using proposition 8.6(b) we may divide 
through by pı, obtaining 


-1 
1=qf...qf gi 


which is possible only ifs = 1, fi = 1. Hence the two factorisations are given 
bym = pı = qi, and p may be taken to be the identity. 

Now suppose the result true for k, and suppose e; +--- +e, = k+1. As for 
k = 1, we have pı = q; for some i. It follows that e, = fj, or else, dividing out 
powers of pı using proposition 8.6(b), one side would be divisible by pı and 
the other not. Now we can divide out all powers of pı that occur, to get 


ict aa ae ea 
By induction, r- 1 = s - 1, and there is a bijection g : {2,...,r} > {1,..., 
i-1,it+1,..., s} such that p; = qyq and ej = fpq forj = 2,..., r. It remains 
only to extend ø to the full set {1, . . . , r} by defining 


pl) =i 
of) = 9G) forj=2,....7 


and the induction step is proved. 


Reflections 


In this chapter we have made significant progress towards a formal approach 
to mathematics based on set-theoretic definitions and proof. At the begin- 
ning of the book you will have had a natural view of the arithmetic of whole 
numbers based on your experience. You knew all kinds of things, such as 
the idea that it didn’t matter how many terms you were adding together, 
you could perform the addition in any order and you would always get the 
same result. Essentially, your experience convinced you of this general prop- 
erty. However, in this chapter we have been able to reduce the whole of 
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arithmetic to a system that satisfies just three axioms (N1), (N2), (N3) and, 
on the assumption that such a system exists, we have been able to prove all 
the usual properties of arithmetic from these three axioms. On occasion the 
journey may have been tortuous, because we needed to base all our argu- 
ments on the explicit formal properties that are either axioms, definitions, 
or theorems proved logically from the axioms and definitions. Now we have 
established a rich collection of properties of natural numbers that we can use 
as a foundational context to build new theory. 

In future chapters we will use the same techniques to formulate set- 
theoretic axiomatic structures and further definitions within those struc- 
tures, safe in the knowledge that properties proved in a given axiomatic 
structure will continue to hold in new situations which satisfy the given ax- 
ioms and definitions. We will also use previously established results as part 
of our technique without explicitly revisiting the detail already proven wher- 
ever we are safe in the knowledge that the detail could be filled in as required. 
This enables us to focus on new ideas and produce increasingly sophisticated 
theories without obscuring the big picture with established detail. 


Exercises 


1. Define m” for m,n € No by 


Prove by suitable induction arguments that 


mr = m”m” 
m” = (m")" 


(mn) = m'n. 


2. A sequence of natural numbers is a function s : N —> No. Write s, 
instead of s(n) and denote s by (sn). Given a sequence (sn), the nth 
partial sum on of (sn) is defined recursively by 


O1 = S1, On+1 = On + Sy4- 


The sum øg, is also written as 0, = s1 +52. +++- + Sn. 
Prove by induction that 

(a) 1+2+---+n=5n(n+1) 

(b) 1°42? +--+. +n? = in(n+1)(2n+1) 

(c) P +2 +--- +n? = 40? (n+1). 
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. Define n! for n € No by 


O!=1,(n4+1)!=n!(n+1). 
Prove by induction on n that (n - r)!r! divides n! for allO < r < n. 


! 
For all n,r € No, 0 < r < n, define ý éNtobe —“_—. 
r (n-r)!r! 


Show that 


and 


(CE )= (5), 


Use the last equality to prove by induction that for all a, b, n € No: 


(a+ b)" =a" naiba (7) ame aia ("Do 


. Prove by induction, or otherwise, 


(a) 1x 1!+2x2!4+---+nxn!=(n+1)!-1, 


o GEC) 
© eG) E 


. Calculate the highest common factor of 2244 and 2145 


(a) by the Euclidean algorithm, 
(b) by factorising 2244 and 2145 into prime factors. 


. The Fibonacci numbers (u,) are defined recursively by 


Uy, = l, Uy = 2, Uys) = Un + Up-1- 


Calculate u3, u4, Us, Us, and uz. Prove that every natural number is a 
sum of Fibonacci numbers. Is this expression unique? 


. lf x1, .. ., Xn are real numbers, prove that 


lxy] +--+ + [Xn| > |x +--+ + xal. 


. Let p/q be a fraction in lowest terms such that 


1 1 
afa 
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10. 


11. 


for a natural number n. Show that £ - -= is a fraction which, in its 
lowest terms, has numerator less than p. Hence, by induction, prove 
that every proper fraction p/q, where p < q, can be written as a finite 


sum of distinct reciprocals 


1 1 
ae ee me 
q ny Nk 
where nı, ..., nk are natural numbers. 
Woda dod 
For example, 5; = 5 + 3+ ip: 


Use the technique developed in this question to express 2 as a sum 
of reciprocals. 


. State and prove analogues of the division algorithm and the Euclidean 


algorithm for polynomials 


P(X) = apx" + Ay ix) + +++ + 
with real coefficients. (Hint: If a, # 0, then the degree of P(x) is an 
element of No.) 


The Tower of Hanoi is a puzzle consisting of n discs, of different sizes, 
which can be placed in three heaps A, B, C. A disc may be ‘legally’ 
moved from the top of one pile to the top of another provided that it 
is not placed on top of a smaller disc. Initially all the discs are placed 
in one pile A, with the largest at the bottom and in decreasing order of 
size up the pile; the other two piles are empty. Prove that there exists 
a sequence of legal moves which will transfer all of the discs to pile B. 


Are the following valid induction proofs? 
(a) Everybody is bald. 


Proof: By induction on the number n of hairs. A man with no 
hairs is clearly bald. Adding one hair to a bald man is not enough 
to make him not bald, so if a man with n hairs is bald, so is a man 
with n + 1 hairs. By induction, however many hairs a man has, he 
is bald. 


(b) Everybody has the same number of hairs. 


Proof: By induction on the number of people. If this is 0 or 1, 
the statement is clearly true. Assume it for n. Take n + 1 people, 
remove one, then by the induction hypothesis the remaining n 
people have the same number of hairs. Remove a different one: 
the remaining n people again have the same number of hairs, so 
the first one removed has the same number of hairs as the rest. 


Hence all n + 1 people have the same number of hairs. 
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(c) If n straight lines are drawn across a circular disk, such that no 
three meet in the same point, then they divide the disc into 2” 
parts. 


Proof: For n = 1,2, the number of parts is 2, 4. Assume the re- 
sult true for n. Adding another line divides each region it passes 
through into two, making 2”*! in all. By induction, the statement 
is proved. 

(d) n? — n + 41 is a prime number (positive or negative) for every 
natural number. 


Proof: 


1? -1 +41 = 41, 27-2441 = 43, 
3? -3 +41 = 47, 4 -4 + 41 = 53, 
5? -5 +41 = 61, 6 -6+41=71,... 


(e) 1+3+5+ --- +(2n-1)=n°+1. 
Proof: If this is true at n, then add 2n + 1 to each side to get 


1+3+5+ ---+(2n-1)+(Qnt¢1) =n? +14 (2n+1) 
=(n+1) +1. 
This is the same formula with n replaced by n + 1, so by induction 


the formula is true for all natural numbers. 
(f) 244+ ---+2n=n(n+1). 


Proof: If2+2+---+2n = n(n + 1) then 


24+44+ +--+ +2n+2(n+1)=n(n+1)+2(n+1) 
so 


24+4+4---+2(n+1) = (n+1)(n+ 2). 


By induction the formula is true for all n. 


12. Induction with a difference. The arithmetic mean of the n real num- 


186 
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bers. a),..., a, is (a, +...+a,)/n and the geometric mean (if they are 


all non-negative) is 4/(aia2 . . . an). Prove that if a),a2,...,a, > 0, 
then 


(tat -+> +a) in > Gi@s.. an). 


You may find that a direct induction proof does not work. Try the 
approach of Cauchy: Let P(n) be the statement ‘(a + --- + an) /n > 
A/ (a, . . . an) for all real numbers aj, ... , an > O. 
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13. 


First establish by a standard induction that 0 < a < b = 0 < a" < 
b”, and deduce that for a,b > 0, a” < b” > a < b. If ./x denotes the 
positive nth root of x > 0, deduce that x, y > 0, x > y @ XK > Vy. 

P(1) is trivial, and P(2) may be established by considering the sign 
of F(a + a2)” - aia. Now prove P(n) = P(2n). (Hint: Use P(n) 
for ai, ..., An and also for an+1, ..., d2n and fit them together us- 
ing P(2).) Then prove P(n) = P(n-1). (Given ay, ...,@y-1, let 
an = (a, + +++ + an-1)/ (n — 1) and use P(n) to show that 


Jlar... an-1an) < an. 


Raise to the nth power and simplify to get P(n — 1).) 
Now deduce that P(n) is true for all n € N. 


Proving a statement P(n) true for all n € N cannot always be achieved 
by a simple induction argument. For example, Goldbach’s Conjecture 
that every even integer is the sum of two primes, 2 = 1+ 1,4 =2+2, 
6=3+3,8=5+3,10=7+3,...seems plausible (provided that 1 is 
considered as a prime). Verify Goldbach’s Conjecture for every even 
integer 2n < 50. Can you see any pattern that might be amenable for 
an induction proof? It is not known whether the conjecture is true or 
false, but the related Odd Goldbach Conjecture—every odd number 
> 7 is a sum of three primes—was proved by Harald Helfgott in 2013. 
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CHAPTER 9 


Real Numbers 


ur intuitive model R of the real numbers motivates the properties 
O= are desirable in a rigorous formulation. There should be two 

binary operations, addition and multiplication, with arithmetical 
properties that let us define subtraction and division. There should also be an 
order relation, appropriately related to addition and multiplication, tailored 
to take account of negative numbers. Finally, we should include the prop- 
erty that distinguishes the real numbers from other number systems like Z 
and Q: completeness. This property, which is distinctly more technical, was 
introduced informally in chapter 2. We show that when these three types of 
property—arithmetic, order, completeness—are formulated precisely, they 
specify the real numbers uniquely, much like (N1)-(N3) specify No. 

There are several ways to express the required properties. The experience 
of the past century’s mathematics is that the following system of axioms is 
one of the best. We define the formal system R of real numbers as a com- 
plete ordered field. We introduce the axioms in order of difficulty: field, then 
order, then completeness. 


Axioms for the Reals: Let R be a set, equipped with two binary oper- 
ations + and . (called addition and multiplication). If a, b € R we calla + b 
the sum of a and b and a.b the product. For traditional reasons we usually 
omit the dot and write ab for the product. 


(a) Arithmetic 
A set R with binary operations + and . is said to be a field if for all 
a,b,c E R 


(A1) a+b=b+a 

(A2) a+(b+c)=(a+b)+c 

(A3) There exists 0 € R such that a + 0 = a for all a € R 
(A4) Given a € R there exists -a € R such that a + (-a) = 0 
(Mı) ab = ba 
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(b) 


(c) 


(M2) a(bc) = (ab)c 
(M3) There exists 1 € R, such that 1 40, and la = a for all a € R 
(M4) Given a € R, a 0, there exists a“! € R such that aa™! = 1 
(D) a(b + c) = ab + ac. 
The elements 0 and 1 are called the zero and unit elements of R. By 
(A1) and (M1) we also have 0 + a = a,(-a) + a = 0, al = 1, and 
(a + b)c = ac + be. 
We define subtraction by 


a-b=a+(-b) 


and division by 
alb = ab™ provided that b # 0. 
Order 
A field R is ordered if there exists a subset R* € R such that 


(O1) a, b e R* > a + b, abe R* 

(02) ae R >a c€ R'tor-a e R* 

(03) (a e R*)&(-ae R*) > a=0. 

These axioms are designed to relate the order to the arithmetic in a 
sensible way. The set R* corresponds to our intuitive idea of the sub- 


set of positive elements (recall that ‘positive’ includes 0). The usual 
order relation is then defined by 


a>bsa-beR’. 
We check later that this really is an order relation. 


Completeness 
Recall the following properties, defined for the intuitive concept R in 
chapter 2: 

An element a € R is an upper bound for a subset S C Rifa > s for 
alls € S. 

A set S with an upper bound is said to be bounded above. An 
element A of R is a least upper bound (lub) for S if 


(i) à > s for alls € S (å is an upper bound) 
(ii) a>s for alls € S > a > d (A is the least among the upper 
bounds). 


We can now state the final completeness axiom: 


(C) If S is a non-empty subset of R and S is bounded above, then S 
has a least upper bound in R. 
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A structure R satisfying all 13 of the above axioms, (A1)-(A4), (M1)-(M4), 
(D), (O1)-(O3), and (C), is called a complete ordered field. (Later we prove 
that such a structure is essentially unique.) 

We could introduce a new axiom, the existence of a complete ordered field. 
However, we do not wish to proliferate axioms unnecessarily. It turns out 
that once we postulate the existence of a system No satisfying Peano’s axioms, 
we can derive from it a related system R that is a complete ordered field. We 
first extend No to construct a formal version Z of the integers Z; then we 
extend Z to construct a formal version Q of the rational numbers Q. Finally 
we develop R from Q. This final construction is technically more difficult, 
mainly because of that vital completeness axiom, but also because we have 
13 axioms to check. Each extension is inspired by the intuitive development 
that we have already encountered at school level. 

This sequence of constructions is a part of our mathematical heritage, 
and all mathematicians should see it at least once in their lives. When first 
discovered, these constructions resolved frontier questions about the foun- 
dations of mathematics, and in particular answered the question ‘what is a 
number?’ In retrospect, however, the main importance of these construc- 
tions today is to demonstrate that the existence of No, plus set theory, implies 
the existence of R. 

The main point to appreciate is that this construction is possible. Once 
it has been performed, everything else can be based on the properties 
(A1)-(A4), (M1)-(M4), (D), (O1)-(O3), and (C). The construction itself is 
a hangover from the nineteenth century, when the natural numbers were 
accepted as the basis of mathematics without enquiring about their logical 
justification, but real numbers were imperfectly understood and therefore 
seemed mysterious. At that time, it was important to prove that the real num- 
bers are genuine mathematical objects. That demonstration was effected by 
constructing R from No. Nowadays, having seen that this can be done, the 
psychological and philosophical problems involved seem less serious: it is 
logically equivalent to postulate the existence of R, rather than that of No. In 
fact, it is much more convenient to start with R, because it is straightforward 
to locate within it a chain of subsets 


RDQIAZ2INo 


that provides the rationals, integers, and natural numbers. On the other 
hand, the Peano axioms seem very simple and natural, and we find it easy 
to believe that such a system exists, whereas the 13 axioms for a complete 
ordered field are harder to swallow. 

Chapter 10 explores this alternative approach to the inner constituents of 
R, and should make its advantages clear. We begin in this chapter with the 
construction of R from No. 
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Preliminary Arithmetical Deductions 


Before we begin to build an axiomatic structure Z for the integers, we can 
obtain some useful clues. Our intuitive model Z shows that we should not 
expect all of the properties of a field to hold. Specifically, not all elements of 
Z have multiplicative inverses (reciprocals) in Z, axiom (M4). However, we 
expect all of the other arithmetical axioms to hold. 

Some standard algebraic terminology helps us keep track of which prop- 
erties are under consideration. 


Definition 9.1: A set R having two binary operations satisfying (A1)-(A4), 
(M1)-(Ms3), and D is a ring; more accurately a commutative ring. (The word 
‘ring’ is usually applied to any system satisfying a less restrictive set of axioms 
omitting (M1). Since we rarely deal with non-commutative rings in this text, 
we omit ‘commutative’.) 

If, further, there exists a subset R* of R satisfying (O1)-(O3), R is an 
ordered ring. 


We now make some elementary deductions from these axioms, which, as 
well as being useful in their own right, are good practice in the axiomatic 
style. 


Proposition 9.2: If R is a ring and for some x € R, a + x = a for all a € R, 
then x = 0. If xa = a for alla € R, then x = 1. 

Proof: Put a = 0, so that 0 + x = 0. But 0 + x = x by (A3) and (A1), so x = 0. 
Similarly x = x1 = 1. 


This proposition shows that the zero and unity elements of R are unique: 
no other elements have similar properties. In the same way the negative -a 
of an element a is uniquely determined: 


Proposition 9.3: If x+ a = 0 for elements x, a of a ring R, then x = -a. 
Proof: 


x=x+0 by (A3) 
=x+(a+(-a)) by(A4) 
=(x+a)+(-a) by(A2) 
= 0+ (-a) sincex+a=0 
=-a by (A1) and (A3). 
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If R is a field then multiplicative inverses are uniquely determined (for 
non-zero elements) and the proof is analogous. 


Proposition 9.4: If R is a ring, then for all a € R, -(-a) =a. 


Proof: By definition, a + (-a) = 0. By proposition 9.3, a = -(-a). 


Proposition 9.5: If Ris a ring then a0 = 0a = 0 for all a € R. 
Proof: 


a0 =a(0+0) by(A3) 
=a0+a0 by(D). 


Adding -(a0) to each side, we obtain 


0 = a0 + (-a0) = (a0 + a0) + (-a0) by (A1) 
= a0 + (a0 + (-a0) by (A2) 
=a0+0 by (A1) 
= a0 by (A3). 


Then 0a = 0 by (M1). 


Proposition 9.6: If R is a ring and a, b € R then -(ab) = (-a)b = a(-b). 
Proof: 


ab + (-a)b = (a+(-a))b by (D) and (M1) 
= 0b by (A4) 
=0 by proposition 9.4. 


Hence (-a)b = -(ab) by proposition 9.3. The rest follows by (M1). 


From here it is easy to make further deductions, such as 


(-a)(-b) = ab, (-1)a = -a. 


If R is a field we may also prove that (a~)! = a when a # 0. 
Defining subtraction and division as indicated above, we may also verify 
the expected properties, for example 
(-a)/b = al(-b) = -(a/b) 
(a/b) + (c/d) = (ad + bc)/(bd) 
(a/b)(c/d) = (ac)/(bd). 


The details are left as exercises which will make more sense if you think them 
through and explain them to yourself. 
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Next we look at order properties. At this point we take advantage of the 
discussion of proof in chapter 7, focusing on the new elements of the theory. 
The arithmetical properties of addition and multiplication will be considered 
sufficiently well established that we no longer need to quote chapter and 
verse when using them. We work in a context where the properties of arith- 
metic can be used without the need for explicit proof, and focus on the new 
properties of order. 


ComMMENT. It is common for students to be very successful in manipulating 
expressions using the operations of arithmetic, but to make unexpected er- 
rors when dealing with order relations. For instance, if we know that ab > c, 
we may be tempted to divide by b to get a > c/b. That looks plausible, but it 
is false if b is negative or zero. In the sections that follow, it is important to 
operate carefully with order relationships using the formal definitions. 


Preliminary Deductions about Order 


In this section, R is any ordered ring. Its order relation is defined by 
a>bea-beR'. (9.1) 
It follows thata > 0 <a € R*, so 
Rt = {a € R|a > 0}. (9.2) 
Using (O1)-(O3) we now establish: 
Proposition 9.7: The relation > is a weak order on R. 
Proof: We must verify the three properties 


(WO1) a>b&b>caSaz>c 
(WO2) Either a > b or b> a, 
(WO3) a> b&b>a=> a=b. 


For (WO1), a > b&b > c => a-b, b-c € R*. By (01), (a-b)+ (b-c) € R*, 
soa -c € R* soa > c. (This is our first taste of ‘arithmetic without tears’: an 
axiomatic proof that (a - b) + (b - c) = a - c takes several steps, all omitted 
here.) 

For (WO2), (O2) implies that either a - b € R* or b - a = -(a - b) € R*. 
Therefore a > b orb > a. 

For (WO3), if both a- b and b - a € R* then a - b = 0 by (O3), so a = b. 
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The order relation behaves appropriately with respect to the arithmetic: 


Proposition 9.8: For all a, b, c,d €R, 


(a) a> b&c>dsaatc= bed, 
(b) a> b>0&c>d>0 > ac > bd. 


Proof: Translate the definition of > using (9.1), and do the arithmetic. 


In the definition of an ordered field (or ordered ring) we can replace 
(O1)-(O3) by the properties stated in propositions 9.6 and 9.7, and use the 
relation > to define the set R* by working (9.2) the other way round. Which 
approach we use is a matter of taste. 

The modulus can be defined in an ordered ring by setting 


ike a ifa e Rt 
~ |-aif-aeR*. 


It can then be proved that |a| > 0 for all a € R and, by repeating the argument 
of chapter 2 in this formal context, that 


la + b| < |a| + |b], 
|ab| = |a] [2]. 


Now we have enough technique to carry out the construction of the integers, 
rationals, and reals. 


Construction of the Integers 


To get from Np to the integers we must introduce negative elements. In fact, 
we consider differences m — n of natural numbers. These differences are de- 
finable as natural numbers when m > n, but not when m < n. Our task is to 
give m - n a meaning no matter which of m or n is larger. 

The idea is to relate subtraction to addition, which of course is how we 
were taught about subtraction in the first place. If m, n, r,s € No and m > n, 
r > s, then 


m-n=r-sẹm+s=r+n. 


The right-hand side makes sense without restrictions on m, n, r, s. This gives 
a clue. To construct things that behave like differences m - n, take the set 
No x No of ordered pairs (m, n), where m, n € No, and define a relation ~ 
on this set by 


(m,n) ~ (r,s) > m+s=r+n. 
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It turns out that ~ is an equivalence relation, and the proof requires only 
arithmetic in No. We can then define the integers Z to be the set of equiva- 
lence classes for ~. The equivalence class of (m, n) corresponds to our 
intuitive concept of the difference m-n, and the formal proof runs as follows. 
Let (m,n) denote the equivalence class of (m, n). By the definition of ~, 


(m,n) = (r,s) & m+s=r+n. 
Define addition and multiplication on Z by 
(m,n) + (p,q) = (m+p,n+q), 
(m,n) (p.q) = (mp + nq, mq + np). 


(9.3) 


These definitions are motivated by thinking of (m, n} as ‘m - n and translat- 
ing the sum and product into expressions involving such differences: 
(m-n) + (p-q) =(m+p)-(n+q), 
(m - n)(p - q) = (mp + nq) - (mq + np). 
We need to check that the operations (9.3) are well defined in the sense 
of chapter 4. So suppose that (m,n) =(m',n') and (p,q) =(p’,q’). Then 
m+n =m +n, p+q =p +q. Now 
(m+ p)+(n'+q') =(m+n')+(p+q’) 
= (m' +n) + (p'+q) 
= (m' + p')+(p +4) 


Hence (m+p,n+q) = (m +p',n' +q), and addition is well defined. 
Multiplication is treated in the same way. 

It is now a simple but long-winded exercise to show that Z is an ordered 
ring, taking 


Z* = {(m,n) € Z|m>n in No}. 


Proposition 9.9: With the above operations, Z is an ordered ring. 
Proof: We must check the axioms (A1)-(A4), (M1)-(M3), (D), and 
(O1)-(O3). In all cases we use the definition of Z to restate the required 
property in No, and verify it by arithmetic. 
(A1) Let a = (m,n), b = (p,q). Then 
at+b=(m+p,n+q) 
= (p+m,q+n) by arithmetic in No 


=b+a. 
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(A2) To prove ab = ba, we need to show (m, n) (p,q) = (p,q) (m, n). This 
requires showing that 


(m, n)(p, q) = (mp + ng, mq + np) 
= (p.q)(m,n) = (pm + qn, qm + pn), 


which follows because mp + nq = pm + qn and mq + np = qm + pn 
in No. 

(A3) The simplest way to express 0 as a difference is to form 0 - 0. So we 
consider the element (0, 0) of Z. Now 


(m,n) + (0,0) = (m+0,n+0) = (m,n). 


Thus (0,0) acts as a ‘zero’ element. 
(A4) The additive inverse of m - n ought to be n - m, so we compute: 


(m, n) + (n,m) =(m+n,n+m) =(m+n,m+n). 
Are we in trouble? No, because (m + n, m + n) is equivalent, under 
~, to (0, 0). Therefore (m+n, m + n) = (0,0), and we have proved 
that 
(m, n) + (n,m) = (0,0), 


which is what we want. Now ~ is starting to make its presence felt. 


Proofs of the remaining axioms for arithmetic follow similar lines. We 
could easily write them out for you. But if we did, you might simply read 
them through and commit them to memory to pass a test. To make sense 
of them, it is time to work through them for yourself. The effort of deriv- 
ing and explaining the links to yourself is more likely to set up a coherent 
schema of connections in your mind, which you can build on in the future. 
Mathematics involves active thinking. It is not a spectator sport. 


The next step is to recover the usual notation for integers as positive or 
negative natural numbers. 

Any element of Z* is of the form (m,n) where m > n, so can be written as 
(m-n, 0). Thus every element of Z* is of the form (r, 0) for r € No. Now 
axiom (O2) tells us that for anya € Z, either a € Z* or -a € Z*, hence either 
a= (r,0) ora = —(r,0) = (0,1). 

Define a map f : No —> Z* by f(n) = (n,0). It is easily seen that f is a 
bijection, and that 

flm+n) = f(m)+fln), 
f(mn) = f(m)f(n), 
m> n > f(m) = f(n). 


That is, f is an order isomorphism, in the sense of chapter 8. 
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This leads to a technical problem, which we have met before. We do not 
have No C Z, as we might have hoped: instead, No is order isomorphic to 
the subset Z* C Z. The elements of No and Z* are definitely different math- 
ematical objects: the first is a set of numbers, the second is a set of equiva- 
lence classes of ordered pairs. However, they behave in exactly the same 
manner. 

There are various ways to get round this problem. One is to replace the 
elements of Z* by the corresponding elements of No, creating a hybrid sys- 
tem with the elements 0, 1, 2, ...in No as the non-negative integers and 
the ordered pairs (m, m + n) as the negative integers -n. The diagram below 
should make the idea clear. 


VIEII 


Fig. 9.1 No as a subset of Z 


No 


Z 


This hybrid system contains No as a genuine subset, and extends to include 
elements of the form (m, m + n). Such an element is the additive inverse of n, 
so we can change notation to -n without getting into trouble. 

However, this method is inelegant and lacks the aesthetic simplicity de- 
sired in mathematics. This kind of complication will escalate as we go on 
to construct the rational numbers Q from Z and then the real numbers R 
from Q. At each stage the smaller number system is isomorphic to a subsys- 
tem of the larger system, but it is not actually a subset as such. We could use 
a similar trick to replace a subset of Q by Z, and a subset of R by Q, but the 
elegance of the constructions gets lost. 

Mathematicians take a more pragmatic route. They ‘identify’ No and Z*, 
that is, they ignore the technical set-theoretic distinction between them for 
purposes of arithmetic and order. This causes no harm because these two 
systems have exactly the same mathematical structure as regards arithmetic 
and order. If we ignore the distinction, we can consider No to be a subsys- 
tem of Z. This fits with how the human mind simplifies the situation by 
thinking of No and Z* as different ways to represent the same underlying 
mathematical concept. 

The mathematical justification of this approach will become clear when 
we construct the rational numbers and the real numbers. We will then 
prove that the axioms for a complete ordered field define the real numbers 
uniquely, in the sense that any two systems that satisfy all the axioms are 
order isomorphic. Therefore, up to isomorphism, there is only one complete 
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ordered field. Inside it—genuinely inside as subsets—are systems that corres- 
pond to the natural numbers, the integers, and the rational numbers. We can 
then mentally ‘throw away’ the set-theoretic scaffolding that we are build- 
ing in this chapter, by replacing the systems we have constructed by these 
isomorphic subsystems of R. 

In case you’re wondering why any of this is necessary: actually, you've 
encountered similar problems before. When you did fractions, at some 
stage you had to sort out that 2/1 is the same as 2: some fractions can be 
whole numbers. Similarly, when you did decimals, you got used to replacing 
1-0000... by 1. Technically the first is an infinite decimal that just happens 
to have lots of zeros. It behaves like the whole number 1, but it’s not written 
that way. But clearly, thinking of it as being equal to 1 does no harm. 

From a psychological viewpoint, the real numbers are conceived by the 
human mind as a unique ‘crystalline concept’ that has specific properties yet 
can be represented flexibly in different equivalent forms. In this case the real 
numbers can be defined axiomatically as a list of 13 axioms, represented geo- 
metrically as points on a number line, or symbolically as infinite decimals. If 
the distinction actually matters, you can always sort it out; usually, it doesn’t. 
Once we have this coherent overall structure, we have a perfect platform 
from which to view ‘the’ real numbers as a unique, but flexible, mathematical 
entity. 

To reach that stage, however, we must first go through the technicalities of 
constructing the rationals from the integers and the reals from the rationals. 
This process shows that all of these number systems are consequences of the 
Peano axioms, which characterise the natural numbers. 


Construction of Rational Numbers 


We construct the rational numbers from the integers by following a similar 
strategy to the one used to construct the integers from the natural num- 
bers. But what matters now is not the difference m - n between two natural 
numbers, but the quotient m/n of two integers. So, starting from Z, we must 
introduce a larger set Q for which quotients m/n are defined. 

To do this, let S be the set of all ordered pairs (m, n) where m,n € Z and 
n # 0. Define a relation ~ by 


(m,n) ~ (p,q) & mq = np. 


This is inspired by the property that m/n = p/q if and only if mq = np. Now 
define Q to be the set of equivalence classes for ~. Anticipating the final 
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result, we use the notation m/n for the equivalence class of (m, n). Define 
operations by 


min + plq = (mq + np)/nq, 
(m/n) (p/q) = mp/nq. 


Theorem 9.10: These operations define the structure of a field on Q. 


Proof: The details are left to the reader (offering an opportunity to build up 
the mental links to create a coherent personal schema for these ideas). First, 
check that the operations are well defined; then go through the whole list of 
axioms, one by one. Use the proof of proposition 9.8 as a model. 

Once more, it really is important to think this proof through for yourself. 
We could put it in, but it is seldom helpful to read through someone else’s 
long calculations when they are routine. To help, here’s a hint: if n # 0 the 
multiplicative inverse of m/n is n/m. 


We have now set up the arithmetic of Q, but not its order relation. We 
define an ordering by specifying the positive elements: 


Q = {min € Q|m ne Z*, n #0}. 


Theorem 9.1 1: With the above definition, Q is an ordered field. 


Proof: Once more we want you to construct the idea in your own mind, by 
thinking through the proof for yourself. 


We want the integers to be a subset of the rationals, but once again this 
is true only up to isomorphism. It’s the old problem of n/1 being technically 
different from n, but behaving in exactly the same way. We solve it by prov- 
ing that the map g : Z —> Q defined by g(n) = n/1 preserves the arithmetical 
operations: 


g(m +n) = g(m) + g(n) 
g(mn) = g(m)g(n) 
m>n= g(m) > g(n) 


for all m,n € Z. All three are straightforward. Therefore g is an order 
isomorphism from the natural numbers to the elements of Z of the form m/1. 

Since every rational m/n can be written as (m/1)(1/n) = (m/1)(n/1)1, 
identifying n with n/l does not lead to any conflict in notation, and 
corresponds to the usual intuitive model. This identification lets us think 
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of Z as a subsystem of Q, just as the natural numbers can be thought of as a 
subsystem of the integers. 


Construction of Real Numbers 


The construction of the real numbers is more complicated, and it can be 
carried out in several different ways. It is possible, though technically awk- 
ward, to construct them as infinite decimals, along the lines of chapter 2. 
However, we saw there that the use of approximating sequences of rationals 
has technical advantages. Monotonic sequences are especially easy to han- 
dle, but we shall use more general “Cauchy sequences’, which we will define 
in a moment. As in previous sections, many routine details will be omit- 
ted, and for the same reason: the broad outline becomes more easily visible 
when the details merge into the conceptual background. It remains essential 
for you to think through the relationships for yourself; to understand them 
in a coherent way that helps to build up a flexible personal insight into the 
mathematical structure. 


Sequences of Rationals 


The main idea when constructing the real numbers is to associate each real 
number with an infinite sequence of rational numbers, which in some sense 
form better and better approximations to the real number concerned. Trun- 
cating an infinite decimal further and further to the right is one way to do 
this, but the mathematics is simpler if we avoid being that specific. 

As in chapter 5, but replacing the informal N by the formal version N, a 
sequence of rationals may be formally defined as a function 


s:N>Q. 


We write s, for s(n) and denote the sequence by (sn)nen or by (s1, S2, $35... ), 
or just by (sn). 

Let S be the set of all sequences of rationals. We define addition and 
multiplication within S by 


(an) + (bn) = (an + bn), 
(@n)(bn) = (nbn). 


Lemma 9.12: With these operations, S is a ring. 


Proof: The identity is (1, 1, 1,...), the zero (0, 0, 0, ...), and the additive 
inverse of (a,,) is (-a,). All verifications are routine. 
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We say ‘ring’ because S is not a field. If all s, are non-zero, then (sn) has a 
multiplicative inverse (1/s,). But if any term sn = 0, then this does not work. 
For example, (0, 1, 1,...) cannot have an inverse (bj, b2, b3,...) since 


(0,1, 1,...)(b1, b2, b3, ...) = (0, b2, b3, ...) # (1, 1,1, ...). 


As we saw in chapter 2, every real number may be viewed as the ‘limit’ of a 
sequence of rationals. In the present context, we can take over the definition 
of convergence given in that chapter, provided that we insist that the £ in the 
definition is rational. 


Definition 9.13: A sequence of rationals (s„) converges to! € Q if, given 
any € € Q, € > 0, there exists N € N such that 


n>N-= |s,-I| <e. 


This definition is not yet satisfactory, however: convergence to a rational 
limit is not what really interests us. It fails to deal with real numbers like JI, 
for example. We need a replacement for ‘convergent’ that does not specify 
the limit as such. 

For the sake of argument, assume that it makes sense to talk of a sequence 
of rationals converging to a real limit. Certainly this is so in our intuitive 
models Q C R. The catch is that formally we do not know what the limit 
is. Nonetheless, if (s„) were to converge to a real number J, then there would 
exist some N such that 


|sn -I| < £ for all n > N. 
Hence also 
|sm - l| < € for all m > N. 
Combining the two inequalities we obtain 
|sm — Sn| < 2e for all m, n > N. 


Now this statement does not involve the hypothetical real number /. But it 
still captures the idea of convergence. 

To tidy things up, we start again with 42 instead of £, and thereby obtain 
the essential idea: 


Definition 9.14: A sequence (s,,) of rational numbers is a Cauchy sequence 
if for any rational e > 0 there exists N such that 


m,n >N => |sm-Sa| <€. 
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Intuitively, the terms of such a sequence get closer and closer together. 

This concept is named after Augustin-Louis Cauchy, a prolific nineteenth- 
century French mathematician who made extensive use of such sequences. 
However, it was Georg Cantor who first realised how to use such sequences 
to construct the real numbers using the method presented here. 

Cauchy sequences may be considered intuitively as sequences of rational 
approximations to a real number and this provides the raw material for a for- 
mal construction of the real numbers starting from the rationals. The proof 
requires several lemmas. For the first, we say that a sequence (sn) is bounded 
if there exists a fixed number M such that |s,,| < M for all n. 


Lemma 9.15: Every Cauchy sequence is bounded in Q. 

Proof: Taking £ = 1 in the definition of a Cauchy sequence, there exists N 
such that |s,- S| < 1 for m, n > N. Thus for all n > N we have |s,,-sy4i] < 1; 
that is, |s»| < |swsi| + 1. Hence, for all n € N, 


[sn] < max {|s;|, |52|, <--> [sl [sui] +1}. 


Lemma 9.16: If (an) and (b,) are Cauchy sequences, then so are (an + bn), 
(a,b,), and (-a,). 


Proof: If £ > 0 is rational, there exist N; and N; such that 
m,n > Ni > |am- an| < ie, 
m,n > Ny > |bm- bn| < se. 
So for m, n > N = max(N;, N2) we have 
| (am + bm) - (an + Bn) | = | (am - an) + (Bm - bn) | 
< jam - an| + |bm - bal 
<je+ je 
=E, 


so (an + b,) is Cauchy. 

To show that (a„bn) is Cauchy, use lemma 9.15 to show that there exist 
A, B € Q such that |a,| < A and |b,,| < B for all n € N. Using a little foresight 
(the authors have seen this proof before!), given € € Q, € > 0, observe that 
£€/ (A +B) € Q, £€/ (A +B) > 0. Therefore there exist N,, N3 such that 
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È 


> N m~“n > 
m,n >N, => |a an| < 375 


m,n > Nr > |bm-brl < 


A+B 
If m,n > N = max(N;, N2) then both inequalities hold, so 
lainDin os anbn| = (Gm = an)bm F an(bm = b,)| 


< |@m - An||bm| + |@n||bm — bnl 


B+A 
A+B A+B 


Therefore (a„bn) is Cauchy. 
Finally, (-a„) may be proved Cauchy either by a direct calculation, or by 
putting b, = -1 for all n in the above. 


Letting C denote the set of all Cauchy sequences, we have: 


Proposition 9.17: With addition and multiplication of sequences as de- 
fined, C is a ring. 


Proof: If (a,), (bn) € C then lemma 9.16 says that (an) +(by),(an)(b,), and 
-(a,) € C. Clearly the zero sequence (0, 0, ...) and unit sequence (1, 1,...) 
€ C. Looking at the axioms for a ring we see that this takes care of (A3), (A4), 
and (M3). The remaining axioms hold since, by lemma 9.12, they hold for all 
sequences of rationals. 


However, we still do not have a field, for a sequence like (0, 1, 1, 1, ...) 
is Cauchy, non-zero, and has no inverse. To overcome this we take note 
of another problem: intuitively speaking, different Cauchy sequences can 
converge to the same limit. We have already encountered this in decimal 
notation: 


(1,1,1,1,1,...) 
and 
(0-9, 0-99, 0-999, 0-9999, 0-99999, . . .) 
both converge to 1. 


Both difficulties evaporate when we introduce one further concept: 


Definition 9.18: A sequence (s,) of rationals is a null sequence if it con- 
verges to 0. That is, for all rational € > 0 there exists N such that |s,| < £ 
whenever n > N. 
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If two sequences (an) and (b,) tend to the same limit J, then it is easy to see 
that the sequence (a, — bn) is null. This inspires an equivalence relation on C: 


An ~ by © (an — bn) is null. 


To check that this is an equivalence relation, observe that the properties 
(an) ~ (a,) and a, ~ b, > by ~ an are trivial. Ifa, ~ b, and ay ~ cn then 
(a,—b,) and (b,—-c,) are null, that is they converge to 0. So ((an-bn)+(bn-cn)) 
converges to zero, that is, (a, — cn) is null, so (ay) ~ (cn). 


Definition 9.19: R is the set of equivalence classes of Cauchy sequences, 
and the equivalence class containing (s,) is denoted by [s,] or [s1; 92,..., 
Sm... ]. For q € Q, [q, q, -> q, . . . ] will also be denoted by ĝ € R. 


The alternative notation allows us to distinguish clearly between the 
equivalence class [s„] corresponding to a given Cauchy sequence (s„) and 
the equivalence class $„ corresponding to the specific element s,, for a fixed 
value of n. For instance, for s, = 1/n, 


[sn] = [1, Y» ss- 3 m] 
while 
Sn = [Hy Ya P Ta o |. 
Definition 9.20: The operations of addition and multiplication are trans- 
ferred to R by defining 
[an] + [bn] = [an + bn], 
[an] [Bn] = [nbn]. 


By now, you should, as a reflex, be wondering whether these operations are 
well defined. Yes, they are. For if [a,,] = [a/,] and [b,,] = [b/,] then (a,,-a/,) and 
(bn -b',) are null. Hence ((a,, + b,) - (a), + b',)) is null, so [a, + bn] = [a +0’). 

Multiplication is a little less straightforward. By lemma 9.15 there exist 
rationals A, B such that 


la,| < A, |b| < B, foralln € N. 
Given ¢ > 0 we can find N;, N; such that 


n > Ni > |an -al |< €/(A +B), 
n > N, > |b, -b |<£/(A +B). 
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Ifn > N = max(Nj, N2) then 


|am0m - a,b | = |am(bn - bi) + (am - a,,)0%,| 


< |am||bn - On| + Jam - a,llb,| 


TENAR 

A+B A+B 
zi 

Thus (a„bn - a! b’) is null, so [a,b,] = [a}, b]. 

To show that these operations make R an ordered field, we need to verify 
all the field axioms (A1)-(A4), (M1)-(M3), (D) and define an order on R 
that satisfies the order axioms (O1)-(O3). Most of these are straightforward, 
but when we attempt to define the subset R* of non-negative elements, we 
need to take care of the possibility that an equivalence class [a,,] may be non- 
negative even though some of the individual terms a, are not. (For instance, 
we may have a; = -1, a, = 1 for n > 1.) We deal with this problem by showing 
that if a sequence (an) is not null, then after a certain stage (say for some 
No € N) later terms a, (for n > No) are either all strictly positive or all strictly 
negative. We make this idea precise through the following definition: 


Definition 9.21: A Cauchy sequence (a,) is strictly positive if there is a 
rational number ¢ > 0 and an No €N such that a, > e€ for all n> No. It is 
strictly negative if there is a rational number € > 0 and an No € N such that 
a, < -€ for all n > No. 


We can then prove: 


Lemma 9.22: If (a„) is a Cauchy sequence then it is precisely one of the 
following: 


(i) a null sequence 
(ii) strictly positive 
(iii) strictly negative. 


Proof: Because (a,,) is a Cauchy sequence, 
Ve € Q, € > 03No : mn > No > |am - an| < € (9.4) 


A Cauchy sequence may be null, as in (i). 
If not then (a,) does not tend to zero, and (taking a positive rational 
value, 2¢), the statement 


IN € N Ym EN: m>N > |ay| < 2e 
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is false, so that the following is true: 
YN e NIm € N,m > N: |an| > 2e. 
In particular, 
dm > No: |am| > 2e, (9.5) 
and combining this with (9.4) gives 
n > No > |am-an| < €. (9.6) 
From (9.5), either am > 2e, and (9.6) implies 
n > No > a, > 8, 
which gives (ii), or dm < -2€ and (9.6) implies 
n > Nọ => a, < -€; 
and this gives (iii). 


Summing up, if the Cauchy sequence (an) does not satisfy (i), it must 
satisfy precisely one of (ii) or (iii), as required. 


Proposition 9.23: With the given operations for [a,] + [bn], [an] [bn], R 
is a field. 


Proof: Verification of axioms (A1)-(A4), (M1)-(Ms3), and (D) is straightfor- 
ward. The zero element is [0, 0, 0, ...], the unit element [1, 1, 1,...], and 
the negative of [an] is [-an]. 

However, the inverse 1/[a,] requires a little ingenuity. By lemma 9.22, 
if [a,] + [0] then it must be strictly positive or strictly negative, and, in 
particular, for n > No, we must have a, 4 0. We can then define (b,,) by 


b, = 0 ifn < No 
"| 1/a, ifn > Nọ’ 


so that 


abs 0 ifn<No 
nno |1 ifn> Ny 


Then (a,b,) is a sequence whose terms equal 1 for n > Nọ, so that 
[anbn] = [1,1,1,...] and [b,] is the inverse of [a„]. This completes the proof 
that R is a field. 
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The Ordering on R 


Lemma 9.22 lets us define an order on R in which [a,] < 0 if (an) is strictly 
negative, [a,,] is zero if (a,,) is null, and [a,] > 0 if (an) is strictly positive. 
The equivalent weak order can then be defined as follows: 


Definition 9.24: [a,]¢R* if and only if either (a,) is null or strictly 
positive. 


Proposition 9.25: R is an ordered field. 
Proof: 


(O1) Suppose that [a,], [bn] € R*. Then by considering the cases where 
each of [an], [bn] is null or strictly positive, it is an exercise to prove 
that [a, + b] € R*, [a bn] € R*. 

(O2) If [a,] € R, then by lemma 9.20, the sequence (a, is either null, 
strictly positive, or strictly negative. If it is null or strictly positive 
then [a,] € R*; otherwise (a,) is strictly negative, in which case 
(-a,) is strictly positive and -[a„] € R*. 

(03) If [a,] € R* and -[a,] € R*, then by lemma 9.20 the only 
possibility is [a,] = [0]. 


Completeness of R 


The trickiest property is completeness. Recall that Q is embedded in R by 
defining 


q= [g CEEE q, ...] for q € Q. 


Then Q is order isomorphic to the subset Ô = {qe R|q € Q} of R, an 
assertion that is readily checked. 

Our plan of attack is to show that any non-empty subset X C R bounded 
above by k € R can be shown to have a least upper bound. We first show that 
we can find I, r€Q so that /ER is not an upper bound for X but 7 ER is 
an upper bound. Then we perform a bisection argument to get an increas- 
ing sequence (l„) of rationals which are not upper bounds and a decreasing 
sequence (r„) which are upper bounds where 


0 < r-l, < (r—-D/2" 
so that the two Cauchy sequences (J,,) and (r,,) tend to the same limit 


Un] = [fa] 


which is the required least upper bound for X. 
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We get the proof started with a simple lemma: 


Lemma 9.26: Ifx €R, then! < x <?forsomel,? € Ô. 


Proof: Let (a„) be a Cauchy sequence such that [a,] = x. By lemma 9.15, 
|a,| < A for some A € Q. Choose! € Q where! < -A andr € Q where 
r> Athen! < [an] <r. 


Theorem 9.27: R is a complete ordered field. 


Proof: By proposition 9.25, R is an ordered field. 

To establish completeness, let X be a non-empty subset of R bounded 
above byk € R. 

Because X is nonempty; X must contain an element x € R. By lemma 9.26, 
we have / € Ô where / < x so l is not an upper bound of X. 

By lemma 9.26 for the upper bound k € R, we have k < î for f € Ô and 
so f is also an upper bound of X. 

Start with l = Land ro = r. Suppose that for n > 0, we have found l, € Q 
where |, € R is not a least upper bound for X and r, € Q where 7, € Risa 
least upper bound for X. This is already true for n = 0. 

Let mn = (ln + rn)/2 € Q be the midpoint between l, and ry. If m, is not a 
least upper bound for X, set 


Inet = Mn, Tm+1 = Tn 
otherwise set 

Lin = Ins Tn+1 = Mn. 
By induction, this gives: 


an increasing sequence (I,,) where l, € Q and i € R is not an upper bound 
of X, 


and 


a decreasing sequence (r„) where r, €Q and 7, ER is an upper bound 
of X. 


Figure 9.2 shows a particular case where the set X is marked on R and 
the sequences of rational numbers lọ = J, h, ..., lm ... and ro = f, fi... 


ry)... are marked on the rational line Q. 
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Fig. 9.2 Homing in on the least upper bound of X 


Let d = r- l. Then the length of the interval from l, to r, is d/2", and for 
m,n > N, we have 


Ballade |rm- ra| <a, [rn - ln] < d/2". 


Both (l„) and (r,) are Cauchy sequences and their difference is a null 
sequence, so they represent the same equivalence class: 


u = [l] = [rn] € R. 


The element u € R is an upper bound of X C R. (If not, then there would 
be an element x € X where u < x. Because (7,,) tends down to u, we could 
find 7, that is not an upper bound satisfying u < 7, < x, contradicting the 
fact that u is an upper bound.) It is also the least upper bound, for if k were an 
upper bound where k <u then because (i, ) tends up to the limit u, we could 
find an element k < 1, < u where /, is not an upper bound, contradicting 
the fact that u is an upper bound. 


As before, we have an order isomorphism between the elements q € Q 
and the elements q = [q, q,...,q,...] € R so that we may identify Q as a 
subset of R. 

Finally we have a chain of number systems 


NCNCZCQCR 
as intended. 
Exercises 
1. First, some bookkeeping. 
(a) Write out a full proof of proposition 9.7. 


(b) Complete the proof of proposition 9.9. 
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(c) Prove theorem 9.10. 
(d) Prove theorem 9.11. 


2. If Ris a ring with zero element Og and unit element 1p, for n € No 
define np recursively by 


Or = On, (n + 1)p = ng + 1R, 


and for x € R, n € No define x” € R recursively by 
xX = lp, x! = xx, 


If (") is the binomial coefficient n!/ [r! (n - r)!] , prove that for all x, 
yeR 


n 
(xt+y)" =x" + ngx™ ly +--+ + ( ) XTY pee ty” 
TR 
3. If p € Nisa prime and pz = Og in R, (as is the case, for instance, in 


Zp), Show that 
(x+y)? =x’ +y”. 
Give an example of a ring R where np = Og for n # 0, but (x + y)" # 
xt ey", 
4. If R is an ordered ring, use the definition of order in this chapter to 
show that 


x? — 5px + 6r > OR 


ifand only if x > 3p or x < 2p. 


5. Use the Euclidean algorithm to prove that if m, n € N are coprime, 
that is, have no common factor in N greater than 1, then there exist 
a, b € Zsuch that 


am+bn=1. 


Find a, b when m = 1008, n = 1375. 


6. Prove formally that every positive rational number can be written 
uniquely in the form m/n for coprime m, n € N. This is called ‘ex- 
pressing a fraction in its lowest terms’. If two rationals p/q, r/s are in 
lowest terms, is (ps + qr)/(qs)? What about (pr)/(qs)? 

Show that if p/q is in lowest terms, so is p”/q?. Use the uniqueness 
of expression in lowest terms to give a streamlined proof that ,/2 is 
irrational. 
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10. 


Prove the following results in any ordered field. 

(a) a < b & -b < -a, 

(b) a < b & -b < -a, 

(c) -1 <0 <1, 

(d) ifa 40, then a? > 0, 

(e) 0<a<b>0<b! <a", 

(f£) Ifa < Oandb < 0, then ab > 0. 

Prove that every non-empty finite subset X of an ordered field con- 
tains a smallest element and a largest element. (A smallest element is 
an element x € X such that x < y for all y € X; a largest element is 
defined similarly.) Is the same true if we drop the condition that X be 
finite? 

In the definition of the order relation on R, why is it not a good idea 
to define 


[an] > 0if3 N € Z, such that a, > 0 forall n > N? 


Let a, = 5 - 2 +++++(-1)"!/(n + 1)! Prove that (an) is Cauchy, so 


tends to some limit l. Prove that each a, is rational, but l is not. 
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CHAPTER 10 


Real Numbers 
as a Complete Ordered Field 


chapter. There we postulated the existence of a set satisfying the basic 

properties (N1)-(N3) of natural numbers, and eventually constructed a 
set R that is a complete ordered field. Here we start by postulating the exist- 
ence of a complete ordered field, and work down until we reach the natural 
numbers. This approach is basically simpler from a technical point of view; 
for example, we really do get N C Z C Q C R without any fudging with 
order isomorphisms. However, as remarked in the previous chapter, we have 
to accept a rather lengthy set of axioms, all interacting with each other, and 
some are distinctly complicated. 

We begin with examples of fields, rings, ordered fields, and ordered rings, 
to show that a wide variety of such structures exists. Any system that obeys 
a formal set of axioms is called a model for those axioms, and the power 
of the axiomatic method is that any deduction from the axioms is true in 
any model for those axioms. So any valid deduction from the axioms for an 
ordered field will hold in the models Q, R constructed in the last chapter; 
indeed in any system satisfying the axioms. Therefore we need only perform 
the deductions once, rather than over again for each model. 

The axiomatic method has another kind of power: the ability to single 
out (up to isomorphism) a unique model. For example, this happened with 
axioms (N1)-(N3) for the natural numbers: all systems satisfying them are 
order isomorphic, so to all intents and purposes the same. The same holds 
for the axioms for a complete ordered field: they define a unique system, up 
to order isomorphism. It is therefore permissible to call such a system the 
real numbers. 

As we think about the system of real numbers, we can now imagine it as 
a unique crystalline concept whose properties hold together in a coherent 


iE this chapter we show how to reverse the process used in the previous 
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manner. It may be represented mentally in many ways: as a system satisfying 
the 13 axioms for a complete ordered field; symbolically, as infinite decimals; 
and visually, as points on a number line that satisfy all 13 axioms including 
completeness. However, there is essentially just one such system that can be 
represented in different ways. 

This is the end of our quest: starting from intuitive ideas about points on a 
line and decimal expansions we have formulated a set of axioms that defines 
the required system uniquely. Moreover, within this unique system of real 
numbers, we can obtain equally simple descriptions of integers and rational 
numbers at the same time. 


Examples of Rings and Fields 


Not every system of axioms defines a unique structure, even if isomorphisms 
are permitted. For example, Z and Q are both rings, but they are not iso- 
morphic since Q is a field and Z is not. To motivate the narrowing down of 
possibilities by imposing extra axioms, we describe some further examples. 


Example 10.1: Z,,, the ring of integers modulo n. Let n be an integer > 0, 
and for r, s € Z define 


rs &r-s= kn for some k e Z. 


It is easy to show that ; is an equivalence relation, and we call the set of 
equivalence classes Z,,. We denote the class containing m by my. 

The division algorithm implies that if m,n € Zand n > 0, then there 
exist q, r € Z, with 0 < r < n, such that m = qn + r. Thus m -r = qn, so 
every integer is equivalent under x to an integer r with 0 < r < n. Thus the 
elements of Z, are On, 1n,..., (n — 1)n. As in chapter 4 (where we treated the 
special case n = 3), we can define operations on the equivalence classes by 


Tn + kn = (r + k)n, 
tnkn = (rk) n. 


These operations are well defined and satisfy the axioms for a ring with zero 
element is 0, and unity 1,. 

If n is not prime, then Z, is not a field. For ifn = rk withO < r < n, 
0 < k < n, then 


Tnkn = nn = On. 
Say that an element x of a ring is a zero-divisor if x # 0, but xy = 0 for 


some y ¥ 0, y in the ring. Then r, and k, are zero-divisors. But a field has no 
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zero-divisors, for if xy = 0 in a field and y # 0, then x = xyy™ = Oy? = 0. 
Thus Z, is not a field for composite n. 

For instance, in Z we have 2636 = 06, where 26 7 06, 36 Z 06. These elem- 
ents do not have multiplicative inverses in Ze, as we can check bare-handed 
by trying all six possibilities: 2606 = 06, 26l6 = 26, 2626 = 46, 2636 = 06, 
2646 = 26, 2656 = 46. Nowhere do we get an answer 16. 

However, if n is prime then Z, is a field. There are several ways to see this, 
of which the following is the least sophisticated but the most direct. Given 
Tn Z On, we look for an inverse by calculating all the products 


Tan = On Tala E faes Tn(N- yn =? 
All of these elements are different, for if 
tkn = faln 
where 0 < k <1 < n, then 
tn(l—k)n = On 


so that n divides r(I-k). But each factor lies between 0 and n, and n is prime— 
a contradiction. 

Now this list of products contains exactly n elements, all different, and 
since Z, only has n elements, each must occur precisely once. In particular, 
1,, occurs somewhere, say at 7,k, = 1,; and now k, is the required inverse. 
Hence Z, is a field if n is prime. 

For instance, in Zs, we look for an inverse for 3; by working out 3505 = 0s, 
3515 = 35, 3525 = 15, 3535 = 45, 3545 = 25, and the products are precisely 
the elements of Zs in the order 05, 35, 15, 45, 25. Among them is 15, and the 
inverse of 35 is 25. 


Example 10.2: Q(,/2) = {a+b./2 € R|a, b € Q}. This is a field with 
zero element 0 + 0,/2 and unity 1 + 0/2. The additive inverse of a + b,/2 is 
-a - b,/2, and if a+ b./2 # 0, its multiplicative inverse is 


1 a - b,/2 See ye -b 
atb/2 (at+b./2)(a-b./2) a?-2b?  a?-2b’ 


J2 


(It is an easy exercise to show that if one of a, b is not 0, then a? - 2b” 4 0; in 
fact it is the same as proving y/2 irrational.) 


Example 10.3: This will provide a useful counterexample later. It is the 
field R(t) of rational functions in an indeterminate t. An element of R(t) is 
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most easily described as a quotient of two polynomials, 


Ant” +++++do 
bmt” +++ + bo 


where do, ..., An, bos ..., bm € Rand not all bs are zero. We can think of 
such an expression as giving rise to a function f : D — R where 


D= {x€ R|bmx” +--+ + bo ZO} 


and 


n 
An X” +- -- + ao 


f(x) = baana... bo 
This is a quotient of polynomials in the same way that a rational number is a 
quotient of integers, hence the name ‘rational function’. 

A formal definition of R(t) can be given as follows. First, a polynomial is 
determined by its coefficients ao, .. ., dy, so we can define a formal polyno- 
mial to be a sequence s: No > R such that for some n € No we have s(m) = 0 
for m > n. Write s(m) = sm and denote s by the sequence (so, S1, ... 55; 
...) on the understanding that sm = 0 from some point on. Addition and 
multiplication are defined by 


(Sos Sb- Sp.) + (Pos Pi- -> Pr.) = (So + Pos S1 + Pis-- +s Sr +Prs-+-)s 
(So, Sb <--> So -Po Pi -- -> Pr -»-) = (Sopo Sopi + S1Po> -- -> Ars -+-)s 


where q, = Sopr + S1Pr-1 +++- + Spo. 
The sequence (0, 1, 0, 0, . . . ) can be denoted by t, and then 


(S0 Sis- 3S...) = SQ HSE +++ tst +--+ 


so we recover the usual notation for a polynomial in t, as long as we identify 
s € R with the sequence (s, 0, 0,...). The formal polynomials constitute 
a ring. 

Using equivalence classes of ordered pairs, in exactly the way that we 
constructed Q from Z, we can construct R(t) from the ring of formal poly- 
nomials. The sum and product of rational functions are defined in the 
customary fashion, and the resulting structure is a field. Finally, we identify 
R with the subset of R(t) consisting of functions ao/1, where ap € R. 


It would be possible to exhibit many other interesting rings and fields; 
however, those listed above are especially pertinent to this chapter. 
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Examples of Ordered Rings and Fields 


Next we try to introduce orderings. We say ‘try to’ because the attempt 
doesn’t always succeed, and the reasons for the failure are instructive. 


(Non-)Example 10.4: Z,, cannot be ordered in a way that makes it into 
an ordered ring. (Of course it can be ordered in a way that does not fit the 
arithmetic, for example, 


On < lp <2, ++: < (n-1)y. 


But this does not lead to an ordered ring, for 1, > On, (n- 1)n > On would 
then imply 0, = 1, + (n - 1)n > On, which is absurd.) 

More generally, suppose we could give Z, an order relation making it into 
an ordered ring. Then there is a subset Z$ of positive elements satisfying 
axioms (O1)-(O3). By (O2) either 1, € Z% or -1, € Z}. Since ln = ln X 1, = 
(-1,,)(-1,), either possibility implies that 1,, € Z} using (O1). Now (O1) and 
induction lead to 2, = 1,+1, € Zi» 3, € Zi,...,(n-1), € Zi. But this gives 
the same contradiction as before. Hence Z, cannot be given the structure of 
an ordered ring. 


Example 10.5: Q(,/2) can be given the structure of an ordered field in two 
different ways. 

The first way is to note that Q(,/2) C R, and to restrict the usual order 
relation on R to Q(.,/2): clearly this gives Q(./2) the structure of an ordered 
field. 

The second way is more subtle. There is a map 6: Q(./2) > Q(/2) 
defined by 


O(a + b,/2) = a - b,/2. 


Now 8 is an isomorphism from Q(,/2) to itself (usually called an automorph- 
ism), that is, 8 is a bijection, and for all x, y € Q(./2) 


O(x + y) = A(x) + O(y), 
O(xy) = A(x). 


(Check this!) Denoting the first order relation, defined above, by >, we define 
a new relation > by 


x > y > O(x) > Ay). 


You should check that this, too, gives Q(,/2) the structure of an ordered 
field. For example, if x, y > 0 then (x), 0(y) > 0(0) = 0, so A(x) (y) = 0, 
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so O(xy) > 0, so xy > 0. The remaining axioms are proved in the same way. 
Note that in this ordering ./2 < 0. 


Remark: This example should be offset against the following fact: Z and Q 
can be given the structure of ordered rings (or ordered field in the case of Q) 
in only one way. Here is a quick sketch of the reasoning. As we argued for Z}, 
we always have 1 > 0 because 1 = 1” = (-1)?. Inductively it follows that the 
ordering on Z must have all natural numbers positive, hence using (O2) the 
usual negative integers must be negative in the given ordering. So for Z, only 
the usual ordering works. Since everything in Q is a quotient of integers, the 
same goes for Q (after a little work). 


The same holds for R, but the proof requires the fact that every positive 
real number has a square root, a fact which needs verifying. It is an easy 
consequence of completeness. If x € R and x > 0, let 


L={yeR|y > 0&y <x}. 


Then L is easily seen to be bounded above and non-empty. By completeness L 
has a least upper bound u; a quick contradiction argument shows that u? = x. 

For any ordering making R an ordered field, all elements of the form 
y? (y € R) must be positive, and all elements -y” must be negative. By what 
we have just said, the positive and negative elements of R (in the usual sense) 
must also be positive and negative respectively in any other ordering, since 
they are precisely the elements in the required forms. Thus only one ordering 
exists making R into an ordered field. 


Example 10.6: We can give the field of rational functions R(t) an ordering 
with interesting properties. (This does not give a notion of size to a function, 
but it does not prevent us from imposing an ordering that satisfies axioms 
(O1)-(O3).) Define 


R(t)* = (f(t) E R(t) | 4K € R:x ER, x> K > f(x) = 0}, 


which means that f(t) is considered positive if and only if f(x) is positive for 
all sufficiently large x. (For instance, (t? — 17)/(5t? + 4t) is positive in this 
sense, but (t + 1)/(3t - t°) is negative.) This, it may be verified, makes R(t) 
into an ordered held. If we identify R with the set of constant functions, as 
above, then the ordering on R(t) restricts to the usual ordering on R. 

Surprisingly, R C R(t) is now bounded above. In fact, the function f(t) = t 
is an upper bound. For if k € R then the function g(t) = f(t) - k = t- k has 
the property that g(x) > 0 for all x > k, hence g(t) € R(t)*. This proves that 
t is an upper bound for R in R(t). 
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This possibility of having ordered fields that contain the real numbers 
opens up new avenues in formal mathematics that we will explore later in 
chapter 15. 


Isomorphisms Again 


We have already made use of the concepts ‘isomorphism’ and ‘order iso- 
morphism’ in special cases, and we now discuss them in general. Recall that 
if R, S are rings then 6: R— S is an isomorphism if it is a bijection and if for 
allr, se R 


O(r +s) = O(r) + 0(s), 


(rs) = A(r)O(s). (10.1) 


Various axiomatic structures have been proved unique up to isomorphism. 
You may wonder why we can’t do better and actually make them unique. 
The reason is that this is just too much to ask; in any case, it would present 
no real advantages. An isomorphism, after all, is just a change of name (from 
r to O(r)); so given a ring R, we can find lots of isomorphic rings by finding 
lots of ways changing the names. In formal terms, let S be any set for which 
there exists a bijection 0 : R — S (we don’t assume S is a ring) and use (10.1) 
back-to-front, to define ring operations on S by 


O0(r) + O(s) = O(r +s) 
O(r) O(s) = O(rs). 


Then S is isomorphic to R. 

How do we know that sets S with suitable bijections exist? Take any elem- 
ent t whatever, and let S = R x {t}; define 0 by 0(r) = (r, t). This is always a 
bijection; and different choices of t lead to different choices of S. This shows 
how wide a variety of sets S can be found—and this is just one very simple 
way to find them. 

Since it is the algebraic operations on a ring that are important, not the 
elements themselves, an isomorphic ring is just as good as the ring we 
start from. So it is too restrictive to expect to specify an algebraic structure 
uniquely. On the other hand, uniqueness up to isomorphism is the most that 
we ever require. 

The same goes for an order isomorphism between two ordered rings R 
and S, which in addition to (10.1) satisfies the condition 


r>s => O(r) > 0(s). 


We cease philosophising to point out some useful, simple consequences 
of (10.1). 
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Lemma 10.7: If@:R— Sisan isomorphism of rings, then forallr € R, 


(a) 6(0) =0 
(b) @(1) =1 
(c) @(-r) = -0(r) 
(d) 6(1/r) = 1/6(r), provided 1/r exists. 
Proof: For allr € R, we have r = 0 + r. Applying 6, 
O(r) = 0(0 + r) = 0(0) + O(r). 


Now @ is onto, so every element of S is of the form @(r) for some r € R. 
Hence 


s=0(0)+s 


for all s € S, so by proposition 9.2 of chapter 9, 0(0) = 0. This proves (a), and 
(b) is similar. To prove (c), 


r+(-r) =0 
so 
6(r) + 0(-r) = 0(0) = 0. 


By proposition 9.3 of chapter 9, 0(-r) = -0(r). This proves (c), and (d) is 
similar. 


Definition 10.8: If R is a ring, then a subring of R is a subset S such that 


Gi) nseSSrtseS 
(ii) rnseSo3rseS 
(iii) seSa-seS 
(iv) le $. 


From (iv), (iii), (i) it also follows that 0 = 1 + (-1) € S. For example, Z is a 
subring of Q and Q is a subring of R. 

As with isomorphisms between rings, it is often sufficient to have a subring 
isomorphic to something, instead of actually being that thing. 


Definition 10.9: If R is a field, then a subfield S is a subring that satisfies 
(i)-(iv) and also satisfies 


v) seSs#0Ss' ES. 


For example, Q is a subfield of R. These ideas are applied in the next 
section. 
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Some Characterisations 


Proposition 10.10: Every ring R contains a subring isomorphic either to 
Z or to Z, for some n. 


Remark: Here we must insist that 0 and 1 are different. If we don’t, there is 
also the ring {0} in which all operations lead to 0. 

Proof: Define 0 : Z — R by 6(0) = 0, 6(1) = 1, O(n + 1) = O(n) + 1 (using 
the recursion theorem) for n > 0, then let 0(-n) = -(6(n)) for n > 0. An 
induction argument shows that 


6(m +n) = O(m) + O(n) 
6(mn) = 0(m) 0 (n). 


If @ is an injection, we’ve finished, for then 8 (Z), the image of Z under 0, is a 
subring isomorphic to Z. 

However, 0 might not be injective. In this case there exist r > s € Z such 
that 8 (r) = 0(s). Therefore 0 (r - s) = 0 (r) - 0 (s) = 0. Using the well ordering 
property, let n be the smallest natural number such that n # 0, 6(n) = 0. It 
follows that 6 (0), 0(1), ..., O(n- 1) are all different, for if 0 (r) = 0 (s) with 
0 <r < s< n, then @(s-—r) = 0 and this contradicts the definition of n. 
Also, if 


u-v=qn (u, v, q € Z), 
then 


6(u) - 0(v) = Olu - v) = 6(qn) = 0(q) O(n) = (4)0 = 0. 


Hence, using our notation for Zp, if u, = vn then 0(u) = 0 (v). 
We may therefore define a map ọ : Z, > R by g(u,) = 0(u). The previous 
remark shows that ¢ is well defined. Now 


P(Un + Vn) = p((u + v)n) = 0 (u + v) = 0 (u) + 0 (v) = plun) + (Vn), 
PlUnYn) = g((uv)n) = 0(uv) = O(u)O(v) = P(Un) O(n). 
Since 6(0),..., O(n — 1) are all different, g(0,),...,@((n - 1)n) are all 


different, so g is an injection. Thus g(Z,) is a subring of R isomorphic 
to Zn. 


For fields we get a similar result: 


Proposition 10.11: Every field F contains a subfield isomorphic either to 
Q or to Zp where p is prime. 
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Proof: Using proposition 10.10, F contains a subring S isomorphic to Z or 
to Zn. 

Suppose that S is isomorphic to Z, with 6 : Z — S an isomorphism. Define 
g:Q— Fby 


y(m/n) = O(m)/O(n) (m,n € Z,n #0). 


Notice thatn #0 = O(n) # 0, since 0 is injective, so the right-hand side 
makes sense. Now ¢ is injective, for if p(m/n) = y(r/s) then 


6(m)/O(n) = O(r)/ O(s) 
so 
0(ms) = 0(m)0 (s) = O(r)O(n) = O(rn), 


hence ms = rn and therefore m/n = r/s. It is now easy to check that (Q) is a 
subfield isomorphic to Q. 

Now suppose that S is isomorphic to Z,. If n is composite, n = qr, then 
(qn), O(n) are zero-divisors in F. But a field F has no zero-divisors 
(if xy = 0 and y # 0, then x = xyy™ = Oy! = 0). Therefore n is a prime, 
say n = p; and since Z, is a field we have found a subfield of F isomorphic 
to Zp. 


Next we bring in the order relation. 


Proposition 10.12: Every ordered ring contains a subring order iso- 
morphic to Z. 

Proof: By proposition 10.10 it contains a subring isomorphic to Z or to Z,. 
The proof that Z,, cannot be made an ordered ring also shows that it cannot 
be a subring of an ordered ring. The proof that the ordering on Z is unique 


shows that the subring isomorphic to Z is also order isomorphic to Z. 


Similarly: 


Proposition 10.13: Every ordered field contains a subfield order iso- 
morphic to Q. 

Proof: Eliminate the possibility Z, as in proposition 10.11; then use unique- 
ness of the order on Q. 


These two propositions give simple axiomatic characterisations of Z 


andQ: 
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Z is a minimal ordered ring (that is, Z is an ordered ring with no proper 
subring); 


Q is a minimal ordered field (that is, Q is an ordered field with no proper 
subfield). 


These properties define Z and Q uniquely up to isomorphism. For by 
proposition 10.12, any minimal ordered ring must be isomorphic to Z, and 
by proposition 10.13, any minimal ordered field must be isomorphic to Q. 

Finally, we turn to complete ordered fields. To deal with these we must 
extend to them notions such as ‘limit’ and “Cauchy sequence’. Thus let F be 
an ordered field. By proposition 10.13 it contains a subfield order isomorphic 
to Q, and by change of notation we may assume without loss of generality 
that this subfield is Q itself. We say that a sequence (a,) of elements of F is 
Cauchy if: 


for every € > 0,¢€ € F, there exists N € No such that |am - an| < € for m, 
n>N. 


The sequence (a,,) tends to a limit à € F if 


for every € > 0, € € F, we can find N € No such that |a, -A| < e for all 
n>N. 


These are the previous definitions in a broader context. As before, we write 
lim a, = à or lim a, = À. 
n—> oo 


The key result is: 


Lemma 10.14: In a complete ordered field, every Cauchy sequence has a 
limit. 

Proof: Let (a„) be a Cauchy sequence in F. By the argument of lemma 9.15 
of chapter 9 (carried out in F) the sequence is bounded. Hence so is every 
subset of elements in the sequence. Define 


by = the least upper bound of {ay, an+1, anya. - -}. 
This exists by completeness. Clearly 
bb > biz b>- 


and the sequence (b„) is bounded below—say, by any lower bound for (an). 
Hence we can define 


c = the greatest lower bound of (b,). 
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We claim that c is the limit of the original sequence (an). 
To prove this, let € > 0. Suppose that there exist only finitely many values 
of n with 


c- ie < An < c+ te. 
Then we may choose N such that for all n > N, 
an <c- že OF an >c+ je. 
But there exists N; > N such that if m, n > N; then |am - an| < je. Hence 
for alln > Nı, an < c - le, 
or 
foralln>N,, ay >c + je. 


The latter condition implies that there exists some m with a, > bm for all 

n > Nj, which contradicts the definition of bm. But the former implies that 

we may change by, to by, - +8, which again contradicts the definition of by,. 
It follows that for any M there exists m > M such that 


1 


U5 


E < am <C+3E. 


Since (a,) is Cauchy, there exists M; > M such that |an - am| < te for 
m, n > Mı. Hence for n > Mı, 


C-E < dn <CT+E, 


But this implies that lim a, = c as claimed. 
The next step is: 


Lemma 10.15: Let F > Q bea complete ordered field. If x € F then there 
exists p € Z such that p - 1 < x < p. 


Proof: Suppose n < x for all n € Z. Then Z is bounded above by x, so by 
completeness has a least upper bound k. Hence n +1 < k foralln € Z, 
because also n+ 1 € Z. This implies that n < k- 1, so k- 1 is a smaller upper 
bound for Z. This contradicts the definition of k. Therefore x < n for some 
n € Z Similarly m < x for some m € Z. Since there are only finitely many 
integers between m and n, we can find an integer p that is the smallest such 
that x < p. Then p-1 <x <p. 
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As a final preparatory step: 


Lemma 10.16: Let F be a complete ordered field, and let (a„) and (b,) be 
two sequences with limits a and b respectively. Then 


(a) lim (a, + b,) =a+b 
(b) lim (a,b,) = ab. 


Proof: For (a), copy the proof of theorem 7.1 in chapter 7 and check that for- 
mally it still makes sense. For (b), use the argument of lemma 9.15, chapter 9, 
to show that for all n € No, |a,| < A, |b,| < B for some A, B € F. Then if 
€ > 0 we have £/(A + B) > 0. Hence there exists N, such that for n > Nj, 


|a, -a| < e/(A + B) 
and there exists Nz such that for n > N3 

|b, -b| < e/(A +B). 
Hence for n > N = max(N;, N2), 


lan bn — ab| = |(an - a)b, + a(b, - b)| 
< (e/(A + B))B + Ae/(A + B) 
=g. 


This proves (b). 


For R we get an even stronger statement than propositions 10.12 and 10.13. 


Theorem 10.17: Every complete ordered field is order isomorphic to R. 


Proof: Let F be a complete ordered field. By proposition 10.13 it has a subfield 
order isomorphic to Q. As usual, for notational convenience we identify this 
subfield with Q, so that without loss of generality Q C F. 

Elements of R are equivalence classes [a,] of Cauchy sequences (a,) of 
rationals. Define a map 0 : R > F by 


O([a,]) = lim ay. 


First we need to check that this makes sense. The reason for this is that, in 
the construction of R from Q, we defined a Cauchy sequence (a,) to have 
terms a, € Q and in the definition we used only rational values for £. When 
we speak of the limit in F, we need to allow any £ > 0 that belongs to F. We 
claim that given £ > 0 where € € F, there is a rational e’ with 0 < é’ < e. 
To prove this, note that 1/e € F and, by lemma 10.15, 1/£ < p for some 
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p € Z. Then p > 0, and0 < 1/p € Q. Take ce’ = 1/p € Q. Now because (an) 
is Cauchy in Q, it follows that there exists N € No such that for all m, n > N 


lam — an| < £’. 
Hence for all m, n > N, 
lam -an| < €, 


and (an) is Cauchy in F. By lemma 10.14, lim a, exists in F. 

It is easy to see, using similar arguments, that 6 is well defined and inject- 
ive; and lemma 10.16 proves that 6(R) is a subring of F isomorphic to R. It is 
easy to check that 6 preserves the order relation. 

It remains to prove that 6 is surjective. Let x € F. By lemma 10.15 there 
exists an integer dp with ay) < x < ao + 1. Inductively (and using lemma 10.15 
again) we can find integers a; between 0 and 9 such that 


ay An ay a,t+1 
dg + — +-+ <x <ayt+—t::-+ A 
10 10” 10 10” 
Then if b, = ao + {5 +-+ + 7G we have 
|b, -x| < 1/10” 


and it follows easily (using a similar argument to that in the second para- 
graph of this proof ) that 


lim b, = x. 
Also (b„) is Cauchy in Q, hence [b„] € R, and 
0(b,„) = lim b, =x. 


Therefore 0 is surjective. 


The Connection with Intuition 


We can now tidy up our ideas a little more. We have two types of model of 
the relevant axiom systems: formal models No, Z, Q, R, and informal models 
N, Z, Q, R. Now we explain, plausibly and intuitively, why R is a complete 
ordered field. Then on this intuitive level theorem 10.17 tells us that R and R 
are isomorphic. That is, the formal construction vindicates intuition, and can 
be used to justify all of the properties that we expect in R. We have therefore 
reached the stage where it doesn’t greatly matter whether we use the informal 
R or the formal R. The work we have done renders both equally safe, and 
there is now no essential difference between them. 
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Why, then, did we bother? Because we don’t know this until weve gone 
through the constructions. 

To summarise: this chapter and the last show between them that we can 
build up the number systems in two ways. Either we 


(a) postulate the existence of No and construct Z, Q, R in turn; 
or we 
(b) postulate the existence of R and construct Q, Z, No in turn. 


By judicious combination of the two methods we can therefore start any- 
where, such as Z, or Q, and obtain the remaining systems by using chapter 9 
to work upwards and this one to work downwards. And the uniqueness 
theorems proved in this chapter show that it makes no essential difference 
which method we use: the results are always isomorphic and agree with our 
intuitive ideas. Precisely where we start has now become a matter of taste ra- 
ther than a matter of urgency. From any of the different starting points we 
can provide an equally logical development of all of the usual number sys- 
tems, and recover all of the standard results of elementary arithmetic, from 
an axiomatic basis. 


Exercises 


1. Write out a full proof of proposition 10.13. 


2. Prove that in any ordered field F, 
a’+1>0 for alla € F. 


Deduce that if the equation x? + 1 = 0 has a solution in a field, that 
field cannot be ordered. Find all the solutions of x?+1 = 0 in the fields 
Zo, Z3, Zs. 


3. Use the Euclidean algorithm to show that given m,n € N, there is a 
technique for calculating a, b € Z such that am + bn = h, where h is 
the highest common factor of m,n. Deduce that if m, n are coprime, 
then there exist integers a, b such that am + bn = 1. Find a, b when 
m = 1008, n = 1375. Calculate the multiplicative inverse of 10081375 
in Z1375. 

Show that m, has a multiplicative inverse in Z, if and only if m and 
n are coprime. 


4. In an ordered ring, prove that for all x, y 


Ixl- Ill < x+ yl < Ixl + Dl 
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10. 


11. 


12. 


13. 


From the axioms of a complete ordered field, prove that every posi- 
tive element a of R has a unique positive square root. (Hint: consider 
{x € Q|x* <a.) 

Prove by induction that 0 < a < b => a" < b" in an ordered ring R. 
Given a € R, a > 0, show that if there exists an element r € R such 
that r > 0 andr” = a, then it is unique. 


Show that every positive element in a complete ordered field has a 
unique nth root. (Hint: Consider {x | x” < a}.) 

Use exercise 7 to define x?/1 for a positive element x in a complete 
ordered field and a rational number p/q. 


Define a field Q(,/3) analogous to Q(,/2) and show that there are two 
different ways of making it into an ordered field. 


Show that the two orderings mentioned for Q (4/2) are the only order 
relations under which it is an ordered field. 


Find a field with exactly four different orderings which make it an 
ordered field. 


Let R [t] be the ring of polynomials p(t) = ant” + ant”! +--+ +a 
with real coefficients. Define the relation > by 


p(t) = q(t) + pO) = q(0). 


Does this make R[¢] into an ordered ring? 


Cauchy sequences in a general ordered field 
In our construction of R from Q we started with Cauchy sequences 
in Q and defined a Cauchy sequence using a value of £ € Q. In lemma 
10.6, we were able to show that in a complete ordered field F a Cauchy 
sequence using any value of € € F will also converge. But what 
happens in an ordered field that is not complete? 

Consider the relationship between the following definitions in any 
ordered field F: 


A sequence (a,,) in F is said to converge to the limit a € F if given 
e € FAN ENsuchthatn > N > |a, -a| < e. 


A sequence (an) in an ordered field F is said to be a Cauchy se- 
quence in F if given e € F IN € N such that m, n > N > |am-an| 
<E. 


The field F is said to be Cauchy complete if all Cauchy sequences in F 

tend to a limit in F. 

(a) Prove using the completeness axiom (C) that a complete ordered 
field is Cauchy complete. 
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(b) Prove that a complete ordered field satisfies Archimedes’ condi- 
tion: 


ife € F, e > 0, then 1/10” < e for some n € N. 


14. Let F = R(t) be the field of example 10.3, page 214. Let £ = 1/t. 
(a) Show that 0 < e < 1/n for all n € N. 
(b) Using the general definition of limit (from question 13), prove 
that the sequence (1/n) does not tend the limit 0 in F. 
(c) Show that in a complete ordered field, the sequence 1/n —> 0. 
(d) Prove that an ordered field F is complete if and only if it satisfies 
both Cauchy completeness and Archimedes’ condition. 
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CHAPTER 11 


Complex Numbers 
and Beyond 


omplex numbers are still regarded by some with a mixture of suspi- 

cion and awe, but to a modern mathematician they are just a simple 

set-theoretic extension of the real numbers. In this chapter we show 
how to construct them from R, completing the standard hierarchy of number 
systems No CZCQCRCC. 

We could go on to look for an extension of C. The nineteenth century 
mathematician Sir William Rowan Hamilton found one, which he named 
the quaternions. We describe this briefly, just to show what’s involved. How- 
ever, the moral of modern mathematics is that we must broaden our horizons 
and look to axiomatic systems that describe more general mathematical 
structures. The concept of number is but a part of this study. 

Modern algebra concerns itself with axiomatic systems which, broadly 
speaking, consist of sets with various operations on them. We’ve already met 
two, namely rings and fields, but there are many others. This is not an alge- 
bra book, so we won’t study any of them in detail, but it’s worth mentioning 
the important ones. Looking beyond complex numbers, the more fruitful 
direction is not towards Hamilton’s quaternions, but to the generalised al- 
gebraic structures of modern algebra. However, quaternions do have their 
place, and in some areas of today’s mathematics they are important in their 
own right. 


Historical Background 


In chapter 1 we mentioned the problems associated with the acceptance 
of complex numbers as a genuine concept. It’s worth pausing briefly to 
look at a historical outline, because it may help you to become aware of 
misconceptions that often occur. 
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At the beginning of the sixteenth century there was much interest in 
solving algebraic equations, one of which was: 


Find two numbers whose sum is 10 and product is 40. 

In modern notation this problem leads to the equations 

x+y = 10, 

xy = 40. 
Substituting for y from the first equation into the second, we find 
x (10 - x) = 40, 
so 
x? - 10x + 40 = 0, 


with solutions 


a ea E 


Ifx = 5+ y/(-15) then y = 5 - y (-15), so the solution is the pair of 
expressions 


5+ /(-15),5 - y C19), 


Sixteenth-century mathematicians realised that these expressions could 
not be real numbers. The square of any real number is positive, so -15 is 
not the square of a real number, and ,/(-15) cannot be real. Neverthe- 
less, manipulating these expressions, as if they were numbers, they found 
that whatever ./(-15) might be, when they added the solutions, the terms 
+£./(-15) cancelled, giving 


(5 + /(-15)) + (5 - /(-15)) = 10, 
and when they multiplied them, they got 


(5 + /(-15))(5 - /(-15)) = 5? - (./(-15))” 
= 25 - (-15) 
= 40. 


In short, by treating y (-15) as an ‘imaginary’ number and manipulating it 
algebraically as if its square is -15, the expressions 5 + ./ (-15), 5 - ./ (-15) 
solve the problem. 
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Any positive real number a has a positive square root ./a. The square root 
of a negative real number -a (a > 0), if there were sucha thing, could be writ- 
ten y (a) = y (-1) va. The eighteenth-century mathematician Leonhard 
Euler introduced the symbol i for ,/(-1), so that ./(-a) = iya. An ex- 
pression of the form x + iy where x, y € R was called a complex number, 
though it was still not clear what this really was. Using complex numbers, 
any quadratic equation 


ax + bx +c=0 (a,b,c € R) 


has solutions of the form 


for b? > 4ac, 


-b + vy b? - 4ac 
2a 


and 


-b + iv 4ac - b? 
y= 


2a 


for b? < 4ac. 


In other words, if b? > 4ac then the equation has real solutions, but if 
b? < 4ac it does not—but it does have complex ones. 

At the time, this discovery set up a dichotomy between real (in the sense of 
genuine) and imaginary (in the sense of non-existent) solutions to equations. 
Complex numbers were saddled with the psychological overtones associ- 
ated with the word ‘imaginary’. (Complex did not mean ‘complicated’; it 
meant ‘composed of several parts’, namely x and y. It still means that, but the 
psychological overtones get worse if you think it means ‘complicated’.) 

In 1806 the French mathematician Jean-Robert Argand described a com- 
plex number x + iy as a point in the plane: 


ex+t+iy 


Fig. 11.1 A complex number as a point in the plane 
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The horizontal axis became the real axis, the vertical axis the imaginary 
axis, and the number i was seen as the point one unit up on the imaginary 
axis. 


imaginary 
axis 


real axis 


Fig. 11.2 The real and imaginary axes 


This description was given the name ‘Argand diagram’, and it is by this 
name that the picture of complex numbers as points in the plane is often 
known today, although the idea was put forward earlier in the doctoral 
thesis of the great German mathematician Carl Friedrich Gauss (1799), and 
even this was predated by the little-known work of the Danish surveyor 
Caspar Wessel (1797)—such are the vagaries of historical acknowledgement. 
Although the complex numbers were now described concretely as points 
in the plane, the mystification of earlier eras still shrouded them for most 
people. Gauss realised that the description could be made even simpler: he 
clearly regarded a complex number as a pair (x, y) of real numbers. In the 
1830s, the Irish mathematician Hamilton canonised complex numbers as 
‘couples of real numbers’ (a couple being his name for an ordered pair). This 
is the heart of the matter and the key to the modern description: a point in 
the plane is an ordered pair (x, y), and the symbol x + iy is just another name 
for that point or that pair. The mysterious expression i is none other than 
the ordered pair (0, 1). 


Construction of the Complex Numbers 


We often describe Argand’s representation of complex numbers as the ‘com- 
plex plane’, but as a set it is precisely the same as the ‘real plane’ R?.! 
However, in this context, it is useful to introduce a special notation C as 
an alternative name for R?, the set of ordered pairs (x, y) for x, y € R?. We 
then define addition and multiplication on C by 


1 Modern algebraic geometers call C the complex line. To them the complex plane is 
C? = C x C. You just have to get used to this kind of thing. 
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(x1, y1) + (x2, y2) = (x1 + X21 + y2), (11.1) 
(1,1) (X25 y2) = (X1X2 — ViV2. X1y2 + X271). (11.2) 


It is a simple matter to check that C is a field under these operations with zero 
(0, 0) and unit (1, 0). The negative of (x, y) is (-x,-y) and if (x, y) # (0,0), 
the multiplicative inverse of (x, y) is 


(arse) 
x2 + y2? x24 y2 J 
Define f : R > C by f(x) = (x,0). Then 
Ff (x1 +x) = (x1 + 2,0) = (1,0) + (x20) = f (x1) + f (%2) 
and 
f (x1x2) = (x1x2, 0) = (x1, 0) (x2, 0) = f (x1) f (x2) . 


The function f is clearly an injection and so is an isomorphism of fields, from 
R onto the subfield f(R) C C. This subfield f (R) is none other than the ‘real 
axis’ of Argand’s description. 

As usual, we consider R to be a subset of C via this isomorphism, which 
amounts to regarding the real numbers as the real axis in the complex plane 
and replacing the symbol (x, 0) by x. 

Define i to be the ordered pair (0, 1). Using (11.2), 


i = (0,1)? = (-1,0). 


Thinking of (-1, 0) as the real number -1, this gives i? = -1. 
More generally, using (11.1) and (11.2), 


(x, 0) + (0, 1)(y,0) = (x, 0) + (0, y) = (x,y). 
Replacing (x, 0), (y, 0) by x, y € R respectively, we get 
x+iy = (x,y). 
The complex number x + iy is another name for the ordered pair (x, y). 


ComMMENT There is an occasional misconception that a complex number is 
x + iy where x, y are real and y # 0, reserving the name ‘real number’ for 
x + iy where y = 0. Mathematicians regard all expressions x + iy (x,y € R) as 
complex numbers, and this includes real numbers. 

Returning to the definitions of addition and multiplication, (11.1) and (11.2) 
in this notation, we find 
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(x1 + ivi) + (x2 + ive) = (xı + x2) + i(yı + y2), 


(x1 + iy) + (x2 + ivr) = (x1 %2 - y1y2) + i(x, x2 + x2y1). 


We thus recover the usual addition and multiplication rules for complex 
numbers, which is why definitions (11.1), (11.2) were set up in the first place. 

Historically, in the expression x + iy, x is referred to as the ‘real part’ and 
y as the ‘imaginary part’. Both x and y are real numbers, being the first and 
second coordinates of the ordered pair (x, y) € R?. If 


xX, + iyı = X2 + iy2 
then 


(x11) = (x2, y2), 


and by the usual properties of ordered pairs, 


Xi = X2, Yı = y2. 


Historically this deduction was referred to as ‘comparing real and imagin- 
ary parts’; we now see it as an application of the set-theoretic definition of 
ordered pairs. 

A modern interpretation of the solution of the quadratic x? - 10x + 40 = 0 
is that there are no solutions in R, but if we consider this as an equation in 
C, there are solutions 5 + i,/15. This behaviour is no more ‘complex’ than 
what happens with the equation 2x = 1 in N and Q. There is no solution in 
N, but in Q there is the solution x = i. 

Time and again in mathematics, a problem has no solution in a given 
context, but it does have one when interpreted in a wider context. Don’t be 
surprised by this phenomenon, or give it unwarranted mystical significance. 
More gadgets to solve something may lead to more solutions. 


Complex Conjugation 


A complex number x + iy is also denoted by a single symbol z (or any other 
suitable letter, for that matter). When we write z = x + iy, we always suppose 
that x, y € R, unless something is stated to the contrary. 

If z = x + iy, with x, y € R, then the real part of z is 


Re(z) = x 
and the imaginary part of z is 


Im(z) = y. 
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We also define the conjugate of z = x + iy to be 
Z=x-iy. 
For instance, 3+ 2i = 3 - 2i, 1-2i = 1 + 2i, and so on. Conjugation has 


certain elementary properties, which we collect together as: 


Proposition I I.1: 


(a) Z) +22 =2Z+2Z 
(b) ZIZ = Z122 

(c) Z=z 

(d) z=zOZzeER. 


Proof: Elementary checking of definitions. 


If we define c : C + C by c (z) = Z, then proposition 11.1 tells us that c is an 
automorphism of the field C, and that it is the identity when restricted to R. 


The Modulus 


If z = x+ iy where x,y € R, then x? + y? > 0. Any positive real number has a 
unique positive square root. The modulus or absolute value of z € C is 


|z| = ,/x? + y?. 


For instance, |3 + 2i] = /3? + 2? = ./13,|-5| = ./25 = 5. In particular, for 
any real number x, |x| = vx? and, since the positive square root is taken, this 
reduces to the usual definition of modulus in the real case, 


x forx> 
|x| = 


0 
ee forx ER. 


In geometric terms, the modulus is the distance from the origin to the 
point x + iy in the complex plane. 


Fig. 11.3 The distance from the origin 


II COMPLEX NUMBERS AND BEYOND | 235 


If z) = x1 + iyi, Z2 = X2 + iy2, then 


ler ~ zal = e - 12) +i -yd = y a - 12)? + - 2. 


This is the distance from the point z; to the point z, in the plane: 


Fig. 11.4 The modulus of a complex number 


Proposition | 1.2: 


(a) |z| € R, |z| > 0 for all z € C 
(b) |Z] =0z=0 

(c) |z|? = zz 

(d) |z122| = [zı llz2l 

(e) |z +z2| < [zal + |Z. 


Proof: Parts (a) and (b) are straightforward. Part (c) follows from the 
definitions, for if z = x + iy then 


zz = (x + iy)(x- iy) = x - (iy? = +y = |z}. 
(d) Since |z| > 0 for all z € C, it is sufficient to show 
lz1z2l° = lel? |zl?. 
But 


|z1z2|° = (z122) (Z1z2) by 11.2(c) 
= Z1Z22122 by 11.1(b) 
= Z1Z1Z222 


2 2 
= |Z1|" |z2l°. 
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(e) A frontal attack on this equality leads to some intricate algebra, which 
can be forced through. However, we can be more refined, but less direct, by 
writing Z) = x; + iy), Z2 = X2 + iy. and considering the identity 


(xf + y7) OG +3) - (axa + ia)? = (Myr - xan)? 
which immediately tells us that 
(xix + yy) S(T + NOG +92) = zift. 
Taking square roots yields 
xx + yıy2 < lzllzl, 
which is valid even if x1ıx2 + yıy2 is negative. Hence 
2(x1x2 + y1y2) < 2 |zıl lz2l; 
which gives 
X txt +y + yy +y < xP y + 2z lz +2 + yz 
this simplifies to 
(x1 + x2)” + (1 +y)” < jal? +2lal lal + |z21°, 
which is 
latzl? < (lal+lzl)’. 


Since the modulus is positive, we can take square roots to give 


IZ) +z < |zl+ lal. 


Part (c) of this proposition gives a nice description of the reciprocal of 
z = x + iy when z #0, for then |z|? = x? + y? # 0, so the equation zz = |z|? 
implies 


zzi |z|* = 1, 
sO 
zl=Z/{z\?. 


It is also worth emphasising that, although part (e) of the proposition 
involves an inequality, this is between two real numbers |z; + Z.| and 
zi] + |Zal- 
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Although the subfield R is an ordered field, C is not an ordered field. We 
can order C in the sense of chapter 4; for instance, we can define the relation 
> by 


xX) +iy) > xX. + iy, $ either xı > x2 or xı =x, and yı > yr. 


This is certainly an order relation. However, it does not blend happily with 
the arithmetic; for instance 


z 20,2 20% mz = 0, 
as is demonstrated by the example 
i>0, but ?=-170. 


There is no way to define an order on C which fits with the arithmetic on 
C so as to make C an ordered field in the sense of chapter 9. Doing so would 
require a subset C* C C such that 


(Gi) za,z € C > z +z € C* and zz € C}, 
Gi) zeC=>zeCtor-zeCt, 
(iii) ze C* and -z € C* > z =0. 


But (ii) gives i € C* or -i € Cř; in the first case (i) implies i? € C+, in 
the second (-i)? € C*, so in either case -1 € C*. Applying (i) again we 
find (-i)? € C*, so 1 € C*. This contradicts (iii) because 1,-1 € C* but 
1 # 0. Because of this lack of an order on C, inequalities between complex 
numbers like z; > zz are nonsense unless the numbers involved are real. A 
formula like |zı| > |z2| is perfectly feasible, because |z1|, |z2| € R and the 
real numbers are an ordered field. 


Euler’s Approach to the Exponential Function 


In the next section we define the exponential e* of a complex number z us- 
ing the real exponential and trigonometric functions. We establish the basic 
property e**” = e*e”. We relate trigonometric functions to the complex expo- 
nential, and prove De Moivre’s Theorem, an effective way to prove certain 
basic trigonometric formulas. We use the results to give a geometric inter- 
pretation of addition and multiplication of complex numbers. The complex 
exponential will also be used in chapter 13 to study the symmetries of regular 
polygons. 

First, however, we make a few remarks about the history of the ideas con- 
cerned. A lengthy historical development led to the remarkable insight that 
in the world of complex numbers, trigonometric and exponential functions 
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are intimately related; in fact, two aspects of the same idea. The relationship 
was first discovered by Euler by manipulating the power series for the 
complex exponential, sine and cosine functions. 

His method was purely algebraic, dealing with infinite series. He wrote the 
exponential function as 


and assumed that the series worked for a complex number z. He then wrote 
out the power series for sine and cosine (again for a complex number z) as 


; z? z’ 2n-1 
sinz =z +—-++4+(-1)" sheets 
an 5! (2n—1)! 
and 

2 4 2n 
ZZ z 

cosz = 1 + — -+ (-1)” pre 
2” 4! (Qn)! 


and substituted z = if to give the remarkable equation 
e” = cos0 +isind. 


He could then take 0 = x, where cosa = 0 and sinz = -1, to get the 
relationship 


esi, 


You can bet that Euler was pleased with this! 
In fact, if you multiply this equation by minus one, you get the equation 


ae = 1 


in which four of the most problematic aspects of arithmetic—the minus sign, 
the irrational numbers e and z and the complex number i—all combine 
together to give the simple number 1. 

In this book our goal is to focus on the mathematical foundations, so the 
study of complex power series is postponed to a later course (see, for ex- 
ample, [34]). In such a course, once the theory of power series has been 
developed, we can provide very elegant proofs of these results, including the 
formula for cos(A+B) and sin(A+B) for A, B not only real, but also complex. 
However, at this point, it is instructive to attack the problem in a direct way 
using only real exponential and trigonometric functions and their properties 
derived in more elementary mathematics. 
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Addition Formulas for Cosine and Sine 


The most important properties of trigonometric functions, in this connec- 
tion, are the addition formulas for cosine and sine: 


cos(A+B) = cos A cos B - sin A sin B, (11.3) 
sin(A+B) = sin A cos B + cos A sin B. (11.4) 


You may have seen a geometric proof of these, but this may be in terms of 
right-angled trigonometry in right-angled triangles where the angles con- 
cerned are less than a right angle. In this case, we take an angle A and 
consider a right triangle with one angle equal to A, as in figure 11.5 (left). 
Then, with the triangle sides as shown, we define 


cos A = x/r, sin A = y/r, (11.5) 


and remark that by similar triangles these ratios do not depend on r. In par- 
ticular, we could set r = 1 from the start. (The tangent is given by tan A = y/x, 
but here we focus just on the cosine and sine.) 


Fig. 11.5 Relationships in a right-angled triangle 


This approach initially assumes that A is an acute angle: 0 < A < 7/2. 
If A is an obtuse angle, so that z/2 < A < v, the natural right triangle lies 
to the left of the y-axis, so x is negative. The internal angle of the triangle is 
x — A. Using (11.5) in this new context, we see that 


cos A = —cos(z - A) sin A = sin(x — A) (11.6) 


form/2<A<T7. 

It is possible to continue like this, extending to the ranges m < A < 37/2 
and 37/2 < A < 2x. Then all other real values for A can be dealt with using 
periodicity: 


cos(A + 27) = cos A, sin(A + 27) = sin A. 
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However, this approach involves a lot of cases and gets quite complicated. 
Here we present an alternative approach, which takes a detour through com- 
plex numbers. Along the way, we define cos A and sin A for all real A, and 
deduce (11.3), (11.4) for all real A, B. Finally, we verify that the resulting ex- 


tensions of cos and sin to the entire real line agree with those obtained by 
extending the range of values for A on a case-by-case basis. 


Theorem 11.3: If0 < A,B < 2/2 then (11.3), (11.4) are valid. 
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Fig. 11.6 The proof of the formula for sin(A+B) 


Proof: We may assume that r = 1 for simplicity. Consider figure 11.6, which 
tacitly assumes not only that 0 < A,B < 7/2, but that 0 < A+B < 7/2. 


Assume for the moment that this is true. 
In figure 11.6, the three important right triangles are 
OWX, which tells us about sin A and cos A, 
OPY, which tells us about sin B and cos B, 
OSY, which tells us about sin(A+B) and cos(A+B). 


By the definition of sine and cosine, 


OW = cos A, 
WX = sin A, 
OP = cosB, 
PY = sinB. 


We also need to know that triangles OQP and AWX are similar. The scale 


factor here is the ratio OP : OX, which is cos B : 1. So triangle OQP has the 
same shape as AWX, but its size is multiplied by cos B. 
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Now cos(A+B) = OS = OQ - QS. 
By the remark on similar triangles, OQ : OW = cos B, so 


OQ = OW cos B = cos A cos B. 
Also, triangle YMP is similar to OWX with scale factor YP : OX = sin B, so 
QS = PY sin B = sin A sin B. 
Therefore 
cos(A+B) = cos A cos B - sin A sin B, 


which is (11.3). The proof for sin(A+B) is similar, based on sin(A+B) = YS = 
YM + MS. 

What if A+B > 2/2? Now the picture is similar to figure 11.6, but Y is to the 
left of the vertical axis. The same line of argument works, but it is necessary 
to use (11.6) and be careful about signs. 


We now link the trigonometric functions to the complex exponential func- 
tion, starting with an angle 0 in the range 0 < 0 < z/2. For such we 
define 


e = cos0 +isind. (11.7) 


We can use (11.3), (11.4) to prove: 


Lemma I1.4: If0 < 6, @ < 7/2, then 
eiO) — gid gid 
Proof: From (11.7), 
+9) = cos(0 + p) + isin(6 + ¢). 
Since 0 < 6, @ < m/2 we can appeal to (11.3), (11.4) to get 


e+) = cos(O + o) + isin(6 + ¢) 


= cos 0 cos ¢ - sin @ sind + i(sin 0 cos ġ + cos 8 sing) 


ll 


(cos @ +isin @)(cos @ + isin ø) 


= te, 


The next step is to extend the definition of the exponential of i0 to any real 
number 6. The main point is that putting 6 = 7/2 in equation (11.7) tells us 
that 


e"? = cos(m/2) + isin(/2) = 0+i.1 =i. 


242 | IiI COMPLEX NUMBERS AND BEYOND 


We therefore define 
e02 = jel? (11.8) 


Initially, we know what e means only when 0 < 0 < 7/2. Use (11.8) with 
@ in this range to define e in the range 1/2 < 0 < x. Observe that there is 
no contradiction at 7/2 since (11.8) applied with 0 = 0 gives e’? = i. 

Inductively, we can now extend the range to all positive real 0 by repeat- 
edly multiplying by i. Moreover, if we replace 6 by -0 in (11.8) and divide 
through by i we get 

eilO-m/2) = Lie? 

so we can also extend the definition to negative real numbers 6. Again, the 
definition is consistent whenever the endpoints of ranges of 0 coincide. 

A direct consequence of (11.8) is: 


Lemma I1.5: For any @ € R, 
eil0+27) _ pid 
Proof: By (11.8) applied four times, 


e0 +27) = jet? +37/2) = eil+z) = Peila+z/2) = ite? = ef 


> 


since if = 1. 


Having defined the exponential of i0 for all real 0, we can use (11.7) in the 
opposite direction to define cos and sin for all real 8: 


Definition | 1.6: 
cos0 = Re eË, sind = Ime”. 


This definition makes both sin and cos periodic with period 27, as we 
would expect: 
Proposition I1.7: For any @ € R, 


sin(@ + 27) = sin 0, 


cos(@ + 27) = cos@. 


Proof: Use lemma 11.5 and equate real and imaginary parts. 


Using the formulae for sin(A+B) and cos(A+B), for real values we can then 
establish: 
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Proposition | 1.8: 
el) = ele for x,y E€ R. 
Proof: 


el) = cos(x + y) +isin(x + y) 
= cos x cos y - sin x sin y + i(sin x cos y + cos x sin y) 


= (cosx + isin y)(cos y + i sin y) 


= ee”, 


The Complex Exponential Function 


The final step in defining e? for all complex z is to remove the restriction that 
z should be purely imaginary, that is, z = i. 


Definition 11.9: Letz = x+ iy € C. Then 


e = e” cos y + ie” sin y. (11.9) 


Since x and y are real, this expression makes sense. If y = 0, so that 
z = x € R, it implies that 


e? = e* cos 0 + ie” sin0 = e”, 
since cos0 = 1, sin0 = 0. So the complex exponential reduces to the usual 
real exponential when z is real, which goes some way towards justifying the 


notation e”. 
Moreover, (11.9) immediately implies that 


ety = eet, 


We can now establish a basic property of the complex exponential 
function: 


Theorem I1.10: Ifz, w € C then 
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Proof: Letz = x + iy, w = u + iv, where x, y, u, v € R. Then 


Z+ 


w x+iy+u+iv 
eae 


= eltwtilyty) 


= e* el) by definition 11.9 
= e*e“e'’e!” by proposition 11.8 
=e* e” e” ev 

= ee” by definition 11.9 


= ee”, 


We can now prove: 


Theorem I1.11 (De Moivre’s Theorem): Ifn € N then 
(cos@ +isin@)" = cos n0 + isin nð. 


Proof: By definition 9 this is equivalent to proving that (e”)" = e'””. Use 
induction on n. If n = 1 both sides are identical. Suppose the result is true for 
n, and consider: 


(e Je 2 eind ef 


= e'?* by theorem 11.10 
= eilnt1)e 


which completes the induction. 


Examples 11.12: Letn = 2. Then (cos6 + isin)? = cos20 + isin 20. 
Expand the first expression as cos? 6 — sin? 6 + i(2 cos @ sin 0) and equate real 
and imaginary parts to get 


cos 20 = cos? 0 - sin? 6, sin 20 = 2cos6 sin@, 


which are familiar trigonometric formulas. 
Let n = 3. A similar calculation, expanding the cube of (cos@ + isin 6), 
yields: 


cos 30 = cos? 0 - 3 cos 0 sin? 6, sin 30 = 3 cos? 8 sin@ — sin’ 6. 


The method extends to larger multiples of 0. 
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We can also express the sine and cosine using exponentials: 


Theorem I1.13: If@ € R then 


ei? 4 eid ei? _ eid 
cos@ = ————,, sinO= - 
2 2i 
Proof: Use the equations 
io _ aoe -i0 _ aes 
e” = cos -isin e” = cos -isin 


and solve for cos 6 and sin 8. 


The real and imaginary parts x, y of a complex number z = x + iy are 
the Cartesian coordinates of z in the complex plane. Another useful system, 
polar coordinates, leads to a different way to represent z: 

Theorem 11.14: Everyz € C has a unique expression in the form z = re’®, 
wherer, 0 € R,r > 0,and0 < 6 < 27. 


Proof: Write re? = r cos @ +ir sin 0 and then solve the equations r cos @ = x, 
rsin 0 = y with the stated conditions (see figure 11.7). 


z =re” =r(cos@+isin@) 


Fig. 11.7 Representing a complex number z in the form e? 

By Pythagoras theorem, the number r is equal to the modulus 
|z| = /x? + y?. The angle 0 is called the argument of z, written arg z. 

We can now interpret addition and multiplication of complex numbers 
geometrically. 

For addition, take a fixed but arbitrary complex number w = u + iv and let 
z = x + iy be any number in C. Consider the map ‘add w from C to C: 


Ay(zZ)=Z+w. 
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Then 
a,(z) = (x +iy) + (u + iv) = (x + u) + i(y + v). 


Clearly the effect of this map is to translate z a distance u along the real axis 
and y along the imaginary axis. So the entire plane slides rigidly so that the 
origin moves to w. 

Multiplication can be described using the Cartesian form, but it makes 
more sense in the polar coordinate representation. Take a fixed but arbitrary 
complex number w = se! and let z = re? be any number in C. Consider the 
map ‘multiply by w’ from C to C: 


[Ly(Z) = zw. 
Then 


[y(z) = re se’? = rsh), 

The effect of this map is to multiply all distances from the origin by a factor 
s (known as dilation) and to rotate the entire complex plane anticlockwise 
about the origin through and angle @. 

Complex numbers manage to combine both Cartesian and polar coordin- 
ates in a single mathematical system. Cartesians are best for addition, polars 
for multiplication. To fill in a final piece of the picture: complex conjugation, 
mapping z = x + iy to Z = x - iy, reflects the complex plane in the real axis. 
So the algebra of complex numbers involves the three basic types of rigid 
motion of the plane (translation, rotation, reflection) and dilations in a nat- 
ural manner. This makes complex notation a very efficient way to perform 
calculations involving these transformations of the plane, as we will see in 
chapter 13. 


Quaternions 


We might attempt to extend the number system No C ZC QCRCEC 
further and look for an extension of C. For years in the last century Hamilton 
followed up his conception of complex numbers as ordered couples (x, y) of 
real numbers, searching for a system of triples (x1, x2, x3) with similar prop- 
erties to those of the complex numbers. He never found such a system; we 
now know that none exists. But in 1843, in a marvellous piece of lateral think- 
ing, he found a system of quadruples (x), x2, x3, x4) that is ‘almost’ a field. It 
satisfies all the field axioms, except for commutativity of multiplication. 
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Definition 11.15: A division ring is an algebraic system consisting of a set 
D and two binary operations +, x on D such that for all a,b,c € D, and 
writing ab for a x b as usual, 


(A1) (a+ b)+c=a+t+(b+c). 

(A2) There exists 0 € D such that for alla € D,O+a=a+0=a. 

(A3) Give a € D, there exists -a € D such that a + (-a) = (-a) +a = 0. 

(A4) a+b=b+a. 

(Mı) (ab)c = a(bc). 

(M2) There exists 1 € D,1 #0 such that for all a € D,al = la = a. 

(M3) Given a € D,a # 0, there exists a“! € D such that aa“! = a!a = 1. 
(D) a(b + c) = ab + ac, (b + c)a = ba + ca. 


Hamilton’s quaternions, as he called his system of quadruples, is an example 
of a division ring. Its multiplication is not commutative: for some elements 
a, b we have ab # ba. His discovery can be explained in terms of three 
symbols i, j, k multiplied according to the rules: 


P=pak =- 
ij =k jk=i,ki=j 
ji = -k,kj = -iik = -j. 


The last six of these can be described by writing the symbols i, j, k in a 


clockwise cycle: 
a i > 


k J 


ea 


Fig. 11.8 Hamilton’s quaternions 


Then the product of any two in clockwise order is the third, and the 
product anticlockwise is minus the third. 

Hamilton thought of a quadruple of real numbers (x1, x2, x3, X4) as xı + 
ix, + jx3 + kx4. He added them in the obvious way: 


(xy + ix + jx3 + kx4) + (yı + iy2 + jx3 + ky4) 


= (x1 + y1) + ile, + y2) + j(x3 + y3) + k(x4 + ya) 
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and multiplied them using the above rules for multiplying i, j, k. Written out 
in full this amounts to: 


(xı + ixy + jx3 + kx4) (yı + iva + jys + kya) 
= X11 — X22 — X33 — Xaya 
+ i(xiyo + x2y1 + X3V4 - X43) 
+ jlciys — X2y4 + X31 + X42) 


+ k(x1y4 + x23 — X32 + X4y1). 


This can be written in terms of ordered quadruples simply by replacing each 
ay +ia + jaz + kag by (aj, a2, a3, a4) in the obvious manner. So formally we 
can define addition and multiplication of such quadruples by 
(X15 X2 X3, X4) + (Vis Y2 Y3» Y4) = (X1 + Y1, X2 + Y2, X3 + Y3, X4 + Ya) 
(x1, X2, X3, X4) (Y1 Vas Y3» Y4) = (a), A2, 43, A4) 


where 


Ay = X1yY1 — X2Y2 — X3Y3 — X4Y4, 
a2 = X1y2 + X2y1 + X3V4 — X4y3, 
a3 = X1 V3 — X24 + X3V1 + X4y2, 


A4 = X1Y4 + X2V3 — X3Y2 + X4y1. 


We denote the set of all quadruples, with these operations, by H (for 
Hamilton). These quadruples are called quaternions or (a more old- 
fashioned term) hypercomplex numbers. 


Proposition 11.16: The quaternions H form a division ring. 


Proof: This is simply a matter of checking the axioms (A1)-(A4), (M1)- 
(M3), (D) for H. They are all straightforward, although we will be the first 
to admit that the associativity of multiplication (M1) is tedious to say the 
least. The zero element in (A2) is (0, 0, 0, 0), the negative of (x1, x2, x3, X4) 
in (A3) is (-x), —x2, —x3, —x4), the unit element in (M2) is (1, 0, 0, 0), and the 
inverse of (x1, X2, x3, x4) Z (0, 0, 0, 0) in (M3) is 


(x1, X2, X3, x4)" = (x1/a, —x2/a, -x3la, -x4/a), 


ee E gg 
where a = xj + x5 + x3 + x4. 


Multiplication in H need not be commutative; for instance, 


(0, 1, 0, 0)(0, 0, 1,0) = (0, 0, 0, 1) 
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but 
(0, 0, 1, 0)(0, 1, 0, 0) = (0, 0, 0, -1). 


Writing i = (0, 1,0, 0), j = (0,0, 1,0), k = (0,0,0, 1), this amounts to ij = k, 
ji = -k as explained previously. Hamilton’s other rules for multiplication of i, 
j, kalso follow, because we set up the rule of multiplication to make it happen 
that way. 

If we look at the subset C = {(x, y, 0,0) € H|x,y € R}, we find that multi- 
plication on C reduces to 


(x1, Y1» 0, 0)(x2, Y2, 0, 0) = (x1 x2 — iyo, X12 + X21, 0, 0), 


and that this is commutative. The map f : C — H given by f(x + iy) = 
(x, y, 0, 0) is easily seen to be an isomorphism of fields from C to C. Via this 
isomorphism we can regard C as a subset of H. Writing (x1, x2, x3, x4) as 
xı + ix, + jx3 + kx, the function f : C —> H becomes 


f(xtiy) =x + iy +j0 + ko. 


Inclusion C C H regards the complex number x + iy as the quaternion 
x + iy + j0 + k0. 

Many properties of C can be generalised to H, hence the name 
‘hypercomplex numbers’. For instance the conjugate of a quaternion 
q = X1 + ix, +jx3 + kx, is 


q= x1 — ix, - jx3 - kx4. 


This has some of the properties of the complex conjugate, but not all. In 
particular, 


M1 +42 = 41+ 4 
q=4 
q=qeqeER. 
However, the rule for the conjugate of a product becomes 
J12 = 9241, 
as you can check by explicit calculation. Because multiplication is not 


commutative, we can’t straighten this out by reversing the order of 42 and q1. 
We can also define the modulus of a quaternion q = x, +ix,+jx3+kx, to be 


lal = (xp HX E + x4. 
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In this case, 
q| ER, lą] > 0 forall q €H, 
q| =0¢q=0, 

q = |a\"> 
me| = |a| |a 
q+q| < [al + la]. 

The proofs of these formulas vary in difficulty, though none is truly hard. If 


you are interested, you should seek to work them out for yourself. They are 
analogous to the complex case, taking care with the non-commutativity of H. 


> 


As with complex numbers, for q € H, q # 0, we find qq = la|” where 
lą} F 0, so 


qila? =1 
and 

q'=ā/ la}. 
Some properties of the quaternions are startling, to say the least. For in- 
stance, we know that i? = j? = K? = (-i}? = (-j)? = (-k}? 1, so the 
equation x? + 1 = 0 has at least six solutions in H, namely +i, +j, +k. 
In fact (ib + jc + kd}? = -b° - ° - d’, so any quaternion ib + jc + kd where 
b? + + @ = 1 is a solution of x? + 1 = 0. There are an infinite number of 
solutions in H. 

This is unlike our experience in all previous number systems. In R the 
equation x? + 1 = 0 has no solutions, in C it has two, and in general, in R or 
C, an equation of degree n has at most n solutions. The sudden appearance 
of more roots in the quaternions completely changes the game as a fondly 
held belief fails in the new system. 

The problem with intuition is that experience in one context need not lead 
to expected properties in another. As we move through successively larger 
number systems N C Z C QC RC CC H, we gain some properties, but 
we also lose others. In the natural numbers N, subtracting a number leaves 
a smaller one, but not in the integers Z, where taking away a negative num- 
ber gives more. In the real numbers R, the square of a non-zero number is 
always positive, but not in the complex numbers C. Now we find that in the 
quaternions H, the theorem that every equation of degree n has at most n 
roots is no longer true. 

This continual change of meaning as mathematical systems are generalised 
can cause serious disorientation for the learner. But it is also the secret of 
developing more powerful mathematical systems. 
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Prior to the introduction of the quaternions, the idea that multiplication 
of numbers is independent of their order was always considered to be a 
self-evident, preordained law. The discovery of the quaternions revealed an 
algebraic system worthy of study in its own right, but it also revealed that it 
is possible to have algebraic systems in which ‘a times b’ need not equal ‘b 
times a’. This led to the many new algebraic structures studied in modern 
mathematics. For example, matrix multiplication is not commutative, and 
the theory of vectors and matrices is an essential feature of advanced algebra. 


The Change in Approach to Formal Mathematics 


At this point we free ourselves from the mental block that mathematics must 
work in a preordained natural way, by specifying new axiomatic systems in 
terms of some set with various prescribed operations that have specific prop- 
erties. We have already exemplified this process with rings, fields, ordered 
rings, ordered fields, and so on. The rationals, reals, and complex numbers 
are all examples of fields, so any theorem that we prove true for all fields will 
be true in all of these specific systems. There may also be theorems that are 
true in one field but not in another; for instance, that a Cauchy sequence in 
R or C will converge to a limit, but not in Q, or the square of a non-zero 
number is always positive in Q or R but not in C. 

The formal approach, starting from a list of axioms for a system, may seem 
complicated and abstract. However, once we have shifted to a formal ap- 
proach, the reverse is often true. Any theorems we prove in a given axiomatic 
system will remain true in any new system that satisfies the given axioms. 
This enables us to build more sophisticated structures based on established 
formal theories. 

It may also happen that some of the theorems we prove offer new ways of 
visualising and symbolising the ideas, giving us new ways of imagining the 
structure and operating with the elements of a formal system. As we shall 
see in the next part of the book, not only do intuitive ideas build into formal 
concepts, the resulting formal concepts may take us back to natural ways 
of visualising and operating symbolically with the axiomatic systems, now 
supported by the power of formal proof. 


Exercises 
1. Ifz,...,Z, are complex numbers, prove that 


[z1 +++ + Zyl < [zal + +--+ nl. 
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. Let w be the complex number defined by w = (1 + v -3)/2. Prove that 


wo =landl+w+a’ =0. 


. Let w = e where 0 = 27/n for n € N. Show that z = œ" satisfies 


z" = 1 and draw a picture showing the positions of w, w’,...,@" 


round the unit circle. (These numbers are called the nth roots of 1.) 
Show thatl+@+---+@™!=0. 
Factorise z” — 1 into linear factors over C. By showing that 


(z-o')(z- ow") =z? -2cos 0 +1, 


factorise z” - 1 into linear and quadratic factors. 
In particular, factorise the real polynomial x° -1 into real linear and 
quadratic factors: 


xX -1 = (x - 1)(x* — 2 cos(27/5) + 1)(x* - 2 cos(47/5) + 1). 


. Use De Moivre’s theorem to find formulae for cos 40 and sin 40 in 


terms of sin 6 and cos 8. 


. For quaternions p, q verify 


(a) p+q=p+q 
(b) pq = 4p 

(c) q=4q 

(d) q=qeqeER. 


. For a,b € H, show (a + b)? = a? + ab + ba + b’. Give an example to 


show that we cannot replace this by (a + b)? = a? +2ab +b? in general. 
Ifa € H, b € R, prove that (a + b)? = a? +2ab+b’. Solve the equation 
x? +2x+1 = 0 in R, C, and H. (Let x = y- 1 and solve for y.) By 
the substitution x = y + 1, solve the equation x? - 2x + 2 = 0 in R, C, 
and H. 


7. Solve the equation x (1 + j) +k = 2 + i for the quaternion x. 


8. Solve the equation ixj + k = 3 + 2j for the quaternion x. 


10. 


11. 


. Find x, y € H such that 3ix - 2jy = -1,xk+y=0. 


Define complex quaternions He to be quadruples (a, a2, a3, a4) of 
complex numbers, with the same addition and product rules as given 
for H on page 249. Which of the axioms for a field does Hc satisfy? 


Prove that the complex numbers are Cauchy complete, in the following 
sense: if (a,,) is a sequence of complex numbers such that for alle € R, 
€ > 0, there exists N € N such that |a,, - an| < € forall m,n > N, 
then (a,,) tends to a limit in C. (Hint: Show that x, + iyn > x + iy & 
Xn > X& Yn > Y.) 
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12. Define a binary operation A on R? (known as the vector product) as 


13. 


14. 
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follows: 


(a,b,c) A (d, e, f) = (bf - ce, cd - af, ae - bd). 
Prove for all x, y € R?, 


xAytyAx=0, 
(XAV)AZ+(YAZ)AX+(ZAX)AY=0. 


Consider the ordered pairs of complex numbers (z1, z2) with addition 
and multiplication defined by 


(Z1; Z2) + (w1, w2) = (Z1 + W1, Z2 + w2), 


(Z1, Z2) X (w1, W2) = (Z1W1 - Z2W2, ZW + Z2W1). 


Show that this is isomorphic to the quaternions. (Hint: Consider how 
the complex numbers were constructed by imagining x + iy as the 
ordered pair (x, y) with an appropriate addition and multiplication, 
and generalise this to ordered pairs of complex numbers.) 


Look up the octonions on the internet and see how the above con- 
struction might be extended to ordered pairs of quaternions. The 
quaternions lack the axiom of commutative multiplication. What is 
lost in extending to the octonions? Which is a more productive way 
to develop: generalising from real to complex to quaternions to octo- 
nions and so on, or to develop the general theory of vector spaces of 
dimension n over a field F? 
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PART IV 
Using Axiomatic Systems 


In part I we began by building on earlier experiences in mathematics. In part 
II we used our experiences to build ideas of set theory, logic, and proof. Then 
we used these principles in part III to give formal constructions of natural 
numbers and successively larger number systems. 

Our earlier metaphors of building houses and growing plants depend on 
the foundations on which we base our activities. Building on intuition may 
include implicit beliefs that cause our house to have weak foundations or our 
plants to grow in unsuitable soil. 

The way ahead is to specify clearly the foundations of set-theoretic axioms 
and definitions being used in a particular theory and to focus only on proper- 
ties that can be deduced from them by formal proof. These properties remain 
true, not only in systems that are already familiar, but also in any future ex- 
amples that satisfy the given axioms. This needs to be done with care to make 
sure that we do not use any implicit ideas that have not been proved from the 
axioms and definitions. 

Having built up a framework of theorems proved from the foundational 
axioms and definitions, some theorems, called structure theorems, may prove 
that the system also has visual and symbolic properties, allowing us to im- 
agine formal structures in more natural ways. This enables us to imagine 
new possibilities that we may then seek to prove formally. 

In the case of a complete ordered field, we were led to the visual structure 
of a number line, and the symbolic structure of decimal arithmetic. Structure 
theorems let us complement formal theory with natural visual and symbolic 
models that we can use to imagine new possibilities. 

The first chapter of this part examines these general ideas. The next three 
chapters consider examples of formal systems and their natural interpret- 
ations. The first deals with the notion of a group, a central idea in formal 


mathematics. The other two describe extensions of the natural number sys- 
tem N to infinite cardinal numbers, and of the real numbers R to larger 
ordered fields. Both have structural properties that extend natural ideas to 
give new intuitions now based on formal axioms and proof. This reveals the 
great power of formal mathematics, with structures that may be imagined 
visually and symbolically, now supported by formal proof. 


CHAPTER 12 


Axiomatic Systems, 
Structure Theorems, 


and Flexible Thinking 


s we construct successively larger number systems N C Z C Q € 

R C C C H, each stage gains by generalising some properties, but 

other meanings change. We can talk of prime numbers and factor- 
isation in the natural numbers, but this has no relevance for real numbers in 
general. Fondly held beliefs may fail in more general structures as we found 
when we shifted from the natural numbers to introduce negative or complex 
numbers. 

While the generalisation of ideas can give great power and pleasure, 
changes in meaning can cause serious disorientation, not only for learners, 
but also for research mathematicians. Even a change in a single property, 
such as the loss of the commutative law in the quaternions, has unforeseen 
consequences. For instance, we saw that a quaternionic polynomial can have 
an infinite number of roots, and it is not immediately clear how the loss of 
commutativity leads to that effect. 

These long-term changes in meaning are not only problematic for you the 
reader, they have also occurred in the beliefs of communities of mathemat- 
icians as ideas evolve over the generations. Not only did they occur in the 
past; they continue to occur in the present, and will undoubtedly continue 
into the future as the boundaries of mathematics keep expanding. 

When the Greeks began to formulate geometry, they imagined that points, 
lines, and planes had subtle meanings more perfect than a particular picture 
drawn on paper or scratched in sand. A point for the Greeks was not just 
a mark on paper, it represented a unique location in the plane or in space. 
A straight line was not just the practical result of drawing a pencil line by 
hand guided by a ruler, it was a representation of a perfect straight line with 
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a Platonic existence beyond any human capacity to represent it physically. 
A circle was more perfect than a curve drawn physically using a pair of com- 
passes: it was the locus of a sizeless point moving on a plane at a fixed distance 
from its centre. 

Likewise, whole numbers can be represented practically by counting 
pebbles and placing them in patterns that reveal theoretical structures. For 
example, given a number of pebbles, we can sometimes place them physically 
as a rectangular array and sometimes we cannot, leading to the conceptions 
of composite and prime numbers, and eventually to the formal proof that 
there is an infinite number of primes, and that every whole number can be 
uniquely represented as a product of primes. 

The ancient Greek conception of mathematics was based on phenomena 
that occur in nature, yet were imagined to have perfect Platonic properties 
that were physically unattainable. In this sense, their mathematics is natural, 
in that it is based on observed natural phenomena. Yet they sought a per- 
fect theoretical foundation, arising in their imagination, which takes them 
beyond what is physically attainable in nature. 

As they contemplated more general numbers—which to their way of 
thinking had to be done using geometry—they first imagined numbers as 
magnitudes that measured lengths, areas, and volumes. They related these 
magnitudes to ratios of whole numbers, based on experience in other areas, 
such as music where causing a string to vibrate at a half, a third, or two thirds 
of its length produces harmonics that provide the basis of musical theory. But 
then they discovered that the hypotenuse of a right-angled triangle with unit 
sides is not rational, so they had to take this into account in developing their 
mathematical theories. 

Subsequent communities of mathematicians broadened these ideas by 
introducing new number systems, each of which was accompanied by lan- 
guage that expressed concerns about the new meanings: positive and negative 
numbers, rationals and irrationals, real and complex numbers that have real 
and imaginary parts. The italicised words all have negative connotations. At 
every stage, these new number systems were initially imagined to be more 
abstract than the old ones, and did not appear to relate to naturally oc- 
curring phenomena. But at a later stage, as mathematicians became more 
sophisticated, they found new ways of imagining negative numbers as points 
on an extended number line and complex numbers as points in the plane. 
Moreover, the familiar older concepts started to look just as puzzling as the 
new ones. By the time mathematicians finally understood what a complex 
number was, they had started to wonder about real numbers. 

Geometric ideas continued to be based on imagining points as entities 
that can be marked on lines and lines that go through points. Even when 
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Descartes envisaged points in the plane using a pair of numbers (x, y), the 
Greek view of points and lines continued as the natural basis of geometric 
thinking. 

Newton explained naturally occurring phenomena, such as gravity and 
the movement of the planets, using a combination of Greek geometry and 
symbolic algebra to build his ideas in the calculus. Leibniz imagined quan- 
tities that could be infinitesimally small and produced a powerful symbolism 
for the calculus that has stood the test of time, despite widespread concerns 
about its logical foundation. Later giants in mathematical development fo- 
cused on different aspects. Euler manipulated symbols algebraically using 
power series and complex numbers, and Cauchy imagined infinitesimals 
geometrically as variable quantities on the line or in the plane that be- 
come arbitrarily small. His approach led to major advances, using a blend 
of visual and symbolic methods in real and complex analysis, but it also 
generated significant criticism about the precise meaning. The critics had 
a point: the meaning had not then been fully worked out. What prevailed 
was more an act of faith, that everything would work out much as it al- 
ways had done. Many of Euler’s published papers would have caused him 
to fail today’s examinations and Cauchy’s ideas of infinitesimal quantities 
were later heavily criticised. 

In the latter part of the nineteenth century and the early twentieth, a shift 
occurred from natural mathematics to more formal methods. Mathematical 
entities were introduced through set-theoretic definitions and their proper- 
ties were deduced solely through mathematical proof. A seminal moment 
occurred when David Hilbert, taking refreshment with colleagues in the Ber- 
lin railway station after a lecture on the foundations of geometry, is reputed 
to have said, ‘One must be able to say at all times—instead of points, straight 
lines, and planes—tables, chairs, and beer-mugs’ [7]. The significance of his 
insight was that mathematics did not need to refer only to naturally occur- 
ring phenomena. The focus of attention changed from what the objects are 
to focus on their formally defined properties. 

Instead of thinking of points being marked on lines, the real number 
line was seen to be a set that consisted of points. While natural mathem- 
atics sensed points moving about smoothly on the line, formal mathem- 
atics re-interpreted numbers as fixed entities that make up the set of real 
numbers. 

At this period in the history of mathematics, new ways of thinking were 
introduced that would apply not only to naturally occurring situations but 
also to systems described only in terms of their formally stated properties. 
A range of possible developments occurred, with emphases on different 
aspects of mathematics, including: 
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Intuitionism: natural mathematics based on human perception and con- 
struction; in particular, a construction must be performed explicitly by a 
finite sequence of operations and proof by contradiction is not allowed. 

Logicism: mathematics is based on formal logic without any reliance on 
natural intuition. 

Formalism: mathematics has a formal set-theoretic basis, which Hilbert 
acknowledged could be inspired by natural intuitive experiences, but which 
must be fully formulated in terms of set-theoretic definitions and formal 
proof. 

Subsequent mathematics has expanded into a diverse range of specialities 
as mathematicians focus on particular areas of interest. Applied mathemat- 
icians look at problems and formulate mathematical models that they use 
to find solutions. Physicists consider natural phenomena such as gravity 
or magnetism, and formulate mathematical models in terms of Newtonian 
mechanics or the four-dimensional space-time of Einstein’s theory of rela- 
tivity. They contemplate the origins of the universe in terms of the Big Bang, 
a mathematical model of an expanding universe. They imagine the structures 
of atoms, formulate models involving subatomic particles, perform sophis- 
ticated experiments to see whether their model matches the physical world. 
Climate scientists develop mathematical models of natural changes in long- 
term patterns of weather. Economists model and predict the change and 
growth of economies. Sometimes the models are good predictors, sometimes 
not. If they prove to be inadequate, better models that make more accurate 
predictions are sought. 

Meanwhile, pure mathematicians seek to formulate precise theories that 
work consistently in well-specified contexts. They allow their imaginations 
to range over any phenomena that may intrigue them, seeking patterns 
and relationships to solve problems. At various times, some may use exist- 
ing theories to solve problems, some may build naturally on their previous 
experience to suggest new possibilities, some may reflect on established the- 
ories to seek new theorems, to make new formal definitions and establish 
new formal theories. Many will use a combination of approaches depend- 
ing on the context, as individuals develop their own preferences for ways of 
operating mathematically. 

Students taking courses in different topics are likely to encounter signifi- 
cant differences in approach. You should take these differing approaches in 
your stride. Diversity is an advantage. Mathematics is difficult: we need as 
many different ways to think about it as we can find. The more tools and 
methods you have at your fingertips, the more you can create. 

To help you to develop from a ‘natural’ attitude to mathematics at 
school to the wide variety of more sophisticated mathematics encountered 
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at university, this book builds on natural experiences that are familiar to you 
at the outset, and works towards a formalist approach while building links 
with the underlying logic. 

Once the formalist approach is mastered, two complementary modes of op- 
eration become available. You don’t have to choose one: you can use which- 
ever seems most helpful or fruitful in a given context. A natural approach 
builds formal structures inspired by intuition; a formal approach builds them 
by proving their properties from set-theoretic definitions. You should be 
flexible enough to use whichever approach is appropriate in a given context. 

Agreed, a natural approach based on familiar mental images and symbolic 
operations may be easier for the human mind to grasp as a whole, but it still 
requires formal proof to show that the properties concerned do actually fol- 
low from the formal definition. You may also find new possibilities that you 
never dreamt of before. For instance, the complex numbers extend familiar 
decimals to a new system that has a square root of minus one, and the exten- 
sion from the complex numbers to the quaternions produces a system where 
multiplication is no longer commutative and quadratic equations can have 
an infinite number of roots. The formal approach will provide the structures 
needed to place these new ideas on a sound foundation. 

A formal approach pays attention to the precision of logical deduction 
from specific assumptions and can be used to build subtle schemas of men- 
tal relationships. It is helpful to give these schemas visual and symbolic 
meanings that make natural sense. This may occur through proving certain 
structure theorems that prove that a given formal structure has formally de- 
ducible properties that may be used to picture the ideas or to represent them 
as symbols that can be manipulated to solve problems. 

This allows mathematics to expand in various ways, based on logical 
deduction or thinking about formal systems naturally, using visual or op- 
erational ideas, now supported by the power of formal proof. 


Structure Theorems 


We have already established the axiomatic properties of the familiar number 
systems N, Z, Q, R, and their extensions to C and H. In chapter 8 we proved 
a structure theorem for the natural numbers: 


Any system that satisfies the Peano axioms is order isomorphic to the 
natural numbers N. 


This theorem tells us that the natural numbers are unique up to isomorph- 
ism, and lets us use the word ‘the’. Indeed, for any system where we start with 
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a single element, then move on to another and another, always different from 
any that came before, we have a potentially infinite set that is isomorphic to 
the natural numbers. 

In chapter 10 we introduced a number of axiomatic systems as sets with 
various prescribed operations satisfying specified properties, including rings, 
fields, ordered rings, ordered fields. A number of theorems were proved 
characterising these structures as follows: 


Every ring contains a subfield isomorphic either to Z or to Z, for some 
natural number n (proposition 10.10). 


Every field contains a subfield isomorphic either to Q or to Zp for some 
prime number p (proposition 10.11). 


Every ordered ring contains a subring isomorphic to Z (proposition 10.12). 
Every ordered field contains a subfield isomorphic to Q (proposition 10.13). 


Every complete ordered field is isomorphic to the real numbers R 
(theorem 10.17) and so can be represented visually as points on a number 
line and symbolically as infinite decimals. 


All of these results are structure theorems. That is, they prove that each of 
these structures contains a specific system up to isomorphism—here one of 
Z, Zn, Q, R as appropriate. In such cases, this subsystem has a visual repre- 
sentation as points on a number line (or round a circle in the case of Z,,) and 
corresponding symbolic representations as whole numbers, integers modulo 
n, rationals, or infinite decimals. 

Of course, when using visual representations, we need to be aware that the 
visual picture alone, as seen with our finite human vision, does not provide 
the full structure. For example, the rational numbers and the real numbers 
can both be represented as a number line, yet structurally they have very 
different symbolic and set-theoretic properties. Visual and symbolic thinking 
offer us modes of operation that can inspire formal structures that have well- 
defined consequences. On the other hand, using structure theorems, we can 
think about systems in visual and or symbolic ways now supported by formal 
deduction. 

Structure theorems also tell us that we should relax. The human brain nat- 
urally makes links between mental concepts. Structure theorems let us think 
about formal systems in more brain-friendly visual and operational ways. 
When we started with the natural numbers No and constructed Z, Q, and R, 
we did so by setting up equivalence relations and showing that there is an iso- 
morphism between each system and a subsystem of the next. Subsequently 
we saw that we could start from the top with R, and then find Q, Z, and No 
as subsystems of R, without any need to talk about isomorphisms. 
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How we ‘identify’ one system with an isomorphic copy is sometimes called 
‘abuse of notation’. Far from this being an abuse of the process of mathem- 
atical thinking, it uses notation in a flexible way, helping the human brain to 
work more simply. Isomorphic systems represent the same underlying crys- 
talline concept, which satisfies the required properties. In this way we obtain 
more concise natural ideas, such as: 


Every ring contains either Z or Z, for some natural number n. 
Every field contains either Q or Z, for some prime number p. 
Every ordered ring contains Z. 


Every ordered field contains Q. 


In terms of crystalline concepts, the natural numbers are the unique sys- 
tem satisfying the Peano Axioms, and the real number system is the unique 
complete ordered field. 


Psychological Aspects of Different Approaches 
to Mathematical Thinking 


Just as mathematicians develop their own personal ways of operating math- 
ematically, students also vary in their approach. On occasions they may build 
naturally on their previous experiences, formally through deduction from 
set-theoretic definitions and formal proof, procedurally, based on commit- 
ting proofs to memory to pass the examinations, or use a combination of 
these and other techniques [6]. You may find it helpful to reflect on how you 
make sense of formal mathematics to become aware of why you may have 
certain difficulties and how you might work to improve your understanding. 

You may prefer a natural approach based on your previous experiences. 
This can work well, but it is wise to reflect on the changes of meaning as new 
structures are encountered. It is helpful to develop a flexible understanding 
how old ideas may need to be rationalised to work in a new context where 
new ideas clash with previous experience and cause confusion. Some lec- 
turers tell students to ‘forget all you know and start afresh from the formal 
definitions’, but this is difficult for anyone whose mind is full of earlier ideas 
with deeply embedded mental connections that behave subtly differently. It 
is important to be resilient and think carefully about strange new ideas, to 
make them your own by explaining to yourself how and why they work in 
the new context. 

You may prefer a formal approach, building a schema of ideas based only 
on definitions and proofs. These may be motivated by intuition, but the 
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proofs have to lead to a formal schema of theorems that builds up the en- 
tire knowledge structure. Some students manage to do this successfully, but 
many have trouble with new ideas that may either clash with previous experi- 
ence or involve complicated quantifiers that prove too difficult to cope with. 

You may prefer a procedural approach, learning specific procedures for 
solving routine problems in the course, and remembering theorems by 
heart to reproduce in examinations without being over-concerned about the 
meaning. 

You may use a combination of methods depending on the context. 

In every case you can enhance your understanding by reflecting on the 
proofs and explaining to yourself how and why the deductions work. 

Each approach has subtle aspects that affect your understanding of the 
mathematics. For example, when seeking to make sense of the convergence 
of a real number sequence (a,) to a limit a, a natural approach might imagine 
plotting the points a, for n= 1, 2,..., with a horizontal line y = a representing 
the limit value, and lines y = x-¢ and y = a+¢ above and below representing 
the allowable range of values for a given £ > 0. The definition then says that 
the sequence converges if for any e the terms a, lie within the allowable range 
of values from some n = N onwards. 


Fig. 12.1 A natural limit 


This diagram needs to be imagined dynamically. The values of the se- 
quence are plotted first; then the horizontal line is placed in its appropriate 
place with the range +e above and below, then the value of N is sought so 
that the values of a, for n > N lie in the desired range. Then imagine € hav- 
ing a smaller value, and repeat. The phenomenon must occur fora fixed value 
of a while e is taken as small as is desired. So as e shrinks towards zero, N 
gets bigger, and the relevant terms are sandwiched between the two horizon- 
tal lines. It’s as though the terms of the sequence are being sucked into an 
ever-narrowing funnel. 

While this natural approach may work well for some, it is problematic in 
a number of ways. For example, many students have difficulty with nested 
quantifiers. Instead of writing the definition of the limit as 
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Given € > 0 there exists N such that n > N implies |an - a| < €, 
one student wrote: 


A sequence (an) tends to a limit a for ¢ > 0 if there exists N € N such that 
|a, -a| < £ provided n > N, 


while another wrote: 


If an — a, then there exists ¢ > 0, such that |an —a| < € forall n > N, 
where N is a large positive integer. 


At the very least, it is essential to be able to reproduce the limit definition 
correctly, and even then, there are subtleties. Many examples of limits in- 
volve a formula, and this can give the impression that a sequence approaches 
a limit, but is never equal to it. Some students taking a natural approach be- 
lieve that a constant sequence cannot tend to a limit ‘because it is already 
there’. Others may cope by separating convergence into two distinct ideas 
where some sequences approach the limit while others are at the limit. 

An alternative interpretation, showing real mathematical insight, occurred 
when one student concentrated on the formal definition and realised that in 
computing a value for N, some sequences get within a given value of £ for 
smaller values of N than others, and so “converge at a faster rate’. This led 
to the insight that a constant sequence is ‘the fastest converger of them all’, 
because it is already there. Unlike his colleagues who saw a constant sequence 
as an exceptional case, this student saw the constant sequence as the simplest 
central example of all convergent sequences. It is a mark of true mathematical 
insight to include exceptional cases as part of the general theory. 

Some professors introduce convergence by interpreting it as a numerical 
calculation: given a numerical value of ¢ > 0, calculate a numerical value 
of N. For instance, given the sequence (1/n) and € = 1/1000, work out that 
N = 1000 will do the job, then generalise this idea for general £, by taking N 
bigger than 1/e. 

A numerical approach can be a helpful first step, but only learning how to 
perform a procedural solution can fail in a more general situation such as: 


Given a sequence (a,,) that tends to 1, show that there exists a value of N 
such that a, > i 


In this case the numerical calculation is not appropriate since, as no formula 
for a, is specified, it is not possible to calculate a numerical value for N. 
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A subtler problem arises when proving that a sequence is not convergent, 
using here the idea of negating quantifiers by replacing —V (not for all) by 
4— (there exists not), and —3 by V-, as in chapter 6. 

The statement that a sequence is not convergent takes the form 


a(Ve > 0OANEN:Vn>Nlan-al| <€). 
Successively moving the ‘not’ symbol — past the quantifiers gives 


de > 03N e N:VYn > Nļ|a,-aļ| < € 
Je > 0YN E€ N:—Yn > N |a,-aļ| < € 
Je > 0YNE€N:3n> N>ļ|a,-aļ| <€ 


and finally 
Je > 0YN E€ N:3n> N]ļ|ap-aļ| >e. 


This can now be expressed in words, as finding an £ > 0 such that, for all N, 
we can find n > N such that |a, - a| > e. 

Such a technique can be developed by natural thinking about the def- 
inition, formal manipulation of the quantifiers, or procedural learning of 
the rules manipulating quantifiers. However, the technique is enhanced by 
making sense of how it operates, to overcome possible limitations of intui- 
tive meanings and to develop flexible forms of reasoning that will support 
coherent mathematical thinking. 


Building Formal Theories 


In the remainder of this chapter we offer an overview of how formal math- 
ematical theories can be organised efficiently by focusing first on a small list 
of related axioms to prove properties that can be deduced from them. These 
proven properties can then be used in new contexts that satisfy the given 
axioms, to build increasingly sophisticated theories. 

Several axiomatic systems start from a set with a single operation: here 
are a few. 


Semigroups and Groups 


Examples like the integers under addition or the non-zero integers under 
multiplication suggest thinking about a set X and a binary operation * on X. 
Possible properties might include: 
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(1) (a x b) x c = a x(bxc) for all a, b, c € X. In this case x is associative. 

(2) There exists an element e € X satisfying a * e = e xa = a for all 
a € X. Such an element is an identity. 

(3) If an identity e exists, then for all a € X there exists b € X such that 
ax b = b xa = e. Such an element b is called an inverse for a. 

(4) axb = b xa for alla, b € X. In this case x is commutative. 


A set X with a binary operation * satisfying (1) and (2) is a semigroup. If 
(3) is also satisfied, it is a group. If (1)-(4) all hold then it is a commutative (or 
abelian) group. 

Several of these properties are already familiar in various contexts. 


Examples 12.1: 


(i) No is a semigroup with identity 0 under the binary operation +. 
(ii) No is a semigroup under multiplication with identity 1. 
(iii) Z is a semigroup under multiplication. 
(iv) Z is a group under addition. The identity is zero and the inverse of 
n € Zis -n because n + 0 = 0 +n = nand n + (-n) = (-n)+n=0. 
(v) The non-zero elements of Z form a semigroup under multiplication 
with identity element 1. 
(vi) The non-zero elements of Q (or of R or C) form a group with 
identity 1, and the inverse of r is 1/r. 
(vii) The non-zero elements of H form a group under multiplication. The 
identity is 1, and the inverse of q € H\{0} is q/ lal. 


Examples (i)-(vi) are commutative; example (vii) is non-commutative. 
We consider groups in greater detail in chapter 13 to show how they arise 
naturally in number systems and in many other contexts. We include formal 
deductions that reveal structural features of groups. 


Rings and Fields 


Rings and fields have already been introduced in chapter 9. They can be 
described more succinctly using the notions of group and semigroup. 

In these terms, a ring consists of a set R and two binary operations +and x, 
such that R is a commutative group under +, a semigroup under x (where 
a x bis written as ab), and the two operations are related by the distributive 
laws a(b + c) = ab + ac, (b + c)a = ba + ca for all a, b, c € R. If multiplication 
is commutative then R is called a commutative ring. Thus Z, Q, R, and C are 
commutative rings and H is a (non-commutative) ring. 
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A field is a set F with two commutative operations + and x such that F is 
a group under + with identity 0, F\{0} is a group (with identity 1) and the 
two operations are related by the distributive law a(b + c) = ab + ac for all 
a,b,c EF. 

Examples include Q, R, and C, but not Z (because there are non-zero 
elements without multiplicative inverses) or H (because multiplication is 
non-commutative). 

However, all of these systems are also division rings, consisting of a set 
D with operations +, x such that D is a commutative group under + with 
identity 0, D\{0} is a group under x (not necessarily commutative) and the 
distributive laws hold: a(b + c) = ab + ac, (b+ c)a = ba + ca for alla, b,c € D. 
Examples include Q, R, C, and H. 

Other formal systems can be designed based on our intuitive experiences, 
inspiring set-theoretic definitions that may lead to interesting structures. 
Such activities used to be quite an extensive industry when mathematicians 
were coming to grips with axiomatic structures, but nowadays new axiom 
systems have to prove their worth by helping to advance other areas of the 
subject. 


Vector Spaces 


As an example of an axiomatic system that arises in a wide range of 
situations, yet has a clear natural structure, we consider how points in three- 
dimensional space can be described symbolically by selecting axes and using 
coordinates x, y, Z. 


y 
= 
Fig. 12.2 Points in three-dimensional space 


A point in three-dimensional space corresponds to an ordered triple of 
real numbers (x, y, z). So we can regard space as being the set R° of ordered 
triples of real numbers. We add such triples using the obvious rule 


(1,15 Z1) + (X25 Y2 Z2) = (X1 + x2, Y1 + Yo, Z1 + Z2). 
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Addition is associative and commutative, the triple (0, 0, 0) is an identity, and 
the additive inverse of (x, y, z) is (-x, -y, -z). Therefore R? is a commutative 
group under addition. We can also multiply a triple (x,y,z) by an element 
a € R to get a(x, y,z) = (ax, ay, az). This operation relates to addition and 
multiplication on R according to the rules: 


(a + b)(x, y, z) = a(x, y, Z) + D(x, y, z), 
(ab)(x, y, z) = a(b(x, y, z)), 
1(x, y, Z) = (x, y, Z), 


and to addition of vectors by: 


a((x1, yi» zı) F (x23 y2, z2)) = a(x, Yi» zı) + A(X2; Y2» z2). 


Since we live in three-dimensional space—at least, that’s the natural image 
on a human scale—it may seem strange to talk about higher dimensions. 
However, Einstein’s theory of relativity uses time as a fourth variable, so 
that a point (x, y, z) at time t is given by the ordered quadruple (x, y, z, t). 
What is the fifth dimension? The answer is that this approach is a diver- 
sion from the mainstream of mathematics. Time is a fourth dimension, not 
the only one. “The fourth dimension is a misnomer, so the question makes 
even less sense for ‘the’ fifth. Newtonian and relativistic physics both con- 
strain us to live in three-dimensional space with time as some sort of fourth 
dimension, but higher dimensions have genuine mathematical significance. 
(If string theory—one of the most popular proposals to unify relativity with 
quantum mechanics—is correct, space might really have 10 or perhaps 11 
dimensions. For various reasons, the extra ones don’t show up in daily life.) 
There are sound mathematical reasons for defining spaces with any number 
of dimensions, even infinity. Such spaces arise naturally from mainstream 
mathematics. 

For instance, describing the positions of two independent points (x1, y1, Z1) 
and (x2, y2,Z2) in three-dimensional space requires six real numbers. These 
can be put in order as a single sextuple, (x1, Y1, 21,2, Y2, Z2) which now 
describes the position of them both. So the ‘configuration space’ for the 
two particles—the set of possible arrangements—has six dimensions, and it 
makes sense to denote it by R°. 

Now consider a rigid body in space—for example, an asteroid in the aster- 
oid belt. To describe its position uniquely, we have to specify the positions of 
three non-collinear points P, Q, R in the body. 
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Fig. 12.3 A rigid body in space 


Suppose that the distances PQ, QR, RP are a, b, c. Then we can 
place P at the point (x,,y1,2z,), move Q to any point (x2, y2,Z2) subject 
only to the restriction that the distance from (x),y1,21) to (x2, Y2 Z2) 
is a. By Pythagoras’ theorem (in three dimensions, or applied twice in 
two planes), the distance between (x;,y1,21) and (x2,y2,2z2) in R? is 
V(x -= x2)? + (yı -y2)? +(Z1 -Z)?. So the distance condition can be 
stated as 


(xı - x2) + (yı - y) + (zı - z2) =g. (12.1) 


Finally we can rotate the body around the axis PQ to put R at a point 
(x3, Y3» Z3) subject only to the restrictions QR = b, RP = c: 


(x2 - x3)" + (y2 — ys)? + (2-237 = b°, (12.2) 
(x3 - x1) + (y3 - y) + (z- 2) = č. (12.3) 


Thus the position of the rigid body is determined by the nine coordinates x1, 
Vis Zb X25 Y2 Z2 X3, Y3» Z3, Subject to the equations (12.1)—(12.3). It is possible, 
and by no means bizarre, to consider this as an ordered 9-tuple 


9 
(X15 Vis Zb X2 Y2 Z2 X3» Y3» Z3) € R’, 


so that the rigid body’s position is a point in R? subject only to 
equations (12.1)-(12.3). 

Examples like this in mathematics are legion. Far from restricting ‘spaces’ 
to R, it is a positive advantage to consider the set R” of all n-tuples of real 
numbers, for any n € N. It should now be obvious how to do this. Define 
addition and multiplication by real numbers in R” by: 


(X X25 66s Xn) + (Vis Y2 - - -> Yn) = (X1 + Y1, X2 +2006 ->Xn + Yn) 


a(x1,X2,...,Xn) = (4X1, 4X2, . . . , AXn). 
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These operations satisfy the same properties that we listed for R°. For con- 
venience write v = (x1, X2, . . -> Xn), W = (Yi, Y2,---s Yn). Then these properties 
can be stated as: 


(a+ bv = av + bv 
(ab)v = a(bv) 
lv=v 


a(v+w) =avt+aw foralla,b € R,v,we R”. 


This is the genesis of the idea of a vector space. Consider a set V with 
a binary operation +. Then require a map m : R x V — V, where for 
convenience we write m(a, v) as av. V is said to be a vector space over R if: 


(VS1) V isa commutative group under +. 

(VS2) Foralla,b € R, v, w € R” (a+ b)v = av + bv 
(ab)v = a(bv) 
lv=v 
a(v+w) = av + aw. 


These axioms hold for R”, but there are many other interesting examples 
of vector spaces. 

For instance, let V be the set of all functions from R to R. Then f € V 
means that f : R — R. We can add two functions f, g € V to getf +g: 
R — R by defining (f + g)(x) = f(x) + g(x) for all x € R. For example, if 
f(x) = Pr, g(x) = 3x+2, then f(x)+g(x) = x? +x +3x+2. Multiplication by 
a € Ris given by (af)(x) = a(f(x)) for all x € R. An example is f(x) = $+, 
a = -3, in which case (af )(x) = -3(x° + x°). The set V is a vector space over R 
according to the given definition. In this case the elements of V are functions. 

Vector spaces occur in unexpected places, too. Suppose, for example, we 
try to find the solution y = f(x) of the differential equation 


Py Y 


(Here we assume familiarity with calculus.) Then for differentiable functions 
f:R— R, g:R— R and real numbers a, b € R, we find 


d df dgl) 
PAGES + bg(x)) =a = Trai b ie 
d _ a f(x) dg(x) 

ga T + bg(x)) =a Ja TA 
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Therefore 


2 


d d 
7a (f(s) + bg(x)) + 95 (af) + bg(x)) + 1066(af (x) + bg(x)) 


E d’f(x) df (x) d*g(x) dg(x) 
=al dx? tI dx? E22 dx 


+ Losef(s) | +b | + Loseets) | . 


This implies that if y = f(x) and y = g(x) are solutions of the differ- 
ential equation, then each of the expressions in curly brackets is zero, so 
y = af (x) + bg(x) is also a solution. 

Let S be the set of differentiable functions that are solutions of the 
differential equation. Then (putting a = b = 1) 


figeSsfrges 


and it is easily seen that S is a commutative group under +. Similarly (putting 
b=0), 


aeRfeSsafes. 


Checking axioms (VS1) and (VSz2), we see that the set of solutions S of this 
differential equation is a vector space over R. (There is a solution corres- 
ponding to each initial condition x(0) = p, x’(0) = q, for any p, q € R. So 
there are plenty of solutions, and this statement is not vacuous.) 

Our brief description of mathematical structures might give the impres- 
sion that modern algebra is just an arid catalogue of axioms. To counteract 
that impression, we mention some of the striking deductions that have been 
made using this approach. 

For two thousand years, since the time of the ancient Greeks, mathemat- 
icians wondered whether it is possible to trisect any angle using ruler and 
compass alone. It took an intriguing blend of vector space theory and field 
theory to show that the angle 60° (and many others) cannot be trisected in 
this way (for details, see [32]). 

The method for solving a quadratic equation 


ax? + bx+c=0 
by a process that we now encapsulate in the formula 


E -b + Jb? - 4ac 
g 2a 


x 
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was, in effect, known to the ancient Babylonians more than 3000 years ago. In 
the sixteenth century, Italian mathematicians developed more complicated 
algebraic formulas for any cubic 


ax? + bx? +cx+d=0 
and any quartic 
axt + bæ +x? + dx+e=0. 


For well over two centuries the search continued for an algebraic formula 
for the solution of a quintic 


ax? + bx* + cx? + dx +ex+f =0. 


In the nineteenth century an intricate chain of deduction using field theory 
and group theory showed that no algebraic formula for the quintic exists 
(see [32]). 

Various generalisations of the notion of vector space over R are possible. 
For instance if in the definition of a vector space we replace R by a field F, we 
get a vector space over F. If we replace it by a ring R, then we get the notion of 
a module over R. The study of these systems and their applications is central 
to modern algebra. 

However, not only can we deduce properties formally: we can seek to 
prove a structure theorem. Here is an example for a vector space V over a 
field F. 

Say that v = avı + - - - + anVn where a1, . . ., an E F isa linear combination 
of the vectors v1, ..., Vn € V. For example, (a, a, b) is a linear combination 
of (1, 1, 0) and (0,0, 1) for all a, b € F, because (a, a, b) = a(1, 1,0) + b(0, 0, 1). 

More generally, any vector (x,y,z) € R? is a linear combination of the 
three vectors i = (1, 0, 0), j = (0,1, 0), k = (0,0, 1), because 


(x,y,z) = xi + yj + zk. 


If a vector space V has a set of vectors v1, . . . , Vy so that every vector v € V 
can be written as a linear combination, 


V = dV, +++- +ApVn Where d),...,d, E F, 


then this set of vectors is called a spanning set for V. 

For instance, the vectors i, j, k form a spanning set for R°. They have 
another special property. A set of vectors v1,...,Vn € V is linearly inde- 
pendent if 


AV, +++++AyVy = 0 implies a; =--- =a, = 0. 
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If a spanning set is also linearly independent, then the linear representation 
is unique, for if a vector v € V has two possible representations, 


V = AVi +-+ + AnVn = bivi +++ + bpp 
then 
(a) -= by)vy +--+- + (an - bn)vn = 0 
and, by linear independence, 
a, -bı =0,...,a, -bn = 0, 
so 
a, = by,..., a, = Dy. 


A set of vectors that is both a spanning set and linearly independent is 
called a basis for the vector space V and the vector space is said to be finite 
dimensional. 

A first course in vector space theory (‘linear algebra’) usually concentrates 
on finite-dimensional vector spaces over the fields R, C, or a general field 
F, and proves that any two bases of a given finite-dimensional vector space 
have the same number of elements. This number is called the dimension of 
the vector space. Now, if vı, . . ., Vn is a basis then any element v € V can be 
written uniquely as 


V = AVi +*+- + AnVy. 


So the map f : V — R” for which f(ajvy + --- + anVn) = (a1, ..., An) is 
a structure-preserving isomorphism. This leads to a structure theorem for 
finite-dimensional vector spaces: 


Theorem 12.2: Every finite-dimensional vector space V over a field F is 
isomorphic to F”. 


We omit the proof since the theorem is for illustrative purposes only, but 
the above discussion includes the key ideas. 

This theorem provides a natural symbolic interpretation of a finite- 
dimensional space, in which the vectors are given by coordinates. It then 
turns out that linear maps between vector spaces are given by matrices. If 
F = Randn = 2 or 3, the vectors can be represented visually in two- or 
three-dimensional space. 

The details can be found in any first course on vector spaces. They lay out 
a template for later courses that study other axiomatic structures. 
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The Way Ahead 


In more sophisticated developments of algebra, the systems studied in- 
variably consist of sets with various operations defined on those sets, with 
functions from one system to another preserving the structure. These struc- 
tures have many applications; indeed, the applications often dictate the most 
profitable structures. Applications come from physics, engineering, biology, 
chemistry, economics, statistics, computing, social science, psychology, and 
many other areas. We now stand on a springboard, ready to leap into the 
higher realms of mathematical thought. As examples of how formal systems 
operate in mathematics, the next three chapters study three typical formal 
structures: groups, infinite cardinal numbers, and infinitesimal quantities. 

The notion of a group occurs naturally in many areas, including sym- 
metries of geometric objects and permutations of sets. We prove a structure 
theorem proving that any group can be viewed as a group of permutations of 
a set. 

The second example is a generalisation of finite counting to infinite sets, 
using infinite cardinal numbers. These have an arithmetic of their own, 
deriving from the arithmetic of finite counting but with some significant 
differences. 

The third concept generalises the idea of finite measurement by placing the 
real numbers in an even larger ordered field K that contains R as an ordered 
subfield. There are many possible candidates for K, but all of them share a 
unique structure theorem. An element x € K is said to be finite ifa < x < b 
for a, b € R. It is said to be an infinitesimal if 0 < |x| < a for all positive 
a € R. The structure theorem says that if x is finite, then x is uniquely of the 
form x = c + e where c € R and e is zero or infinitesimal. 

This has a profound consequence in the longer-term development of 
mathematics. While formal mathematics tells us that there are no infini- 
tesimals in the real numbers, it also tells us that any larger ordered field 
must contain infinitesimals. It is possible to develop a theoretical framework 
(called non-standard analysis) that allows the logical use of infinitesimals, 
but this requires a strengthening of the logical foundations. (We said in chap- 
ter 1 that mathematics grows like a tree, not only do its branches grow up, its 
roots must also become stronger to support the larger structure.) 

The existence of a structure with infinitesimals alongside a theory of infin- 
ite cardinals which excludes infinitesimals is not contradictory because they 
occur in two different contexts. Cardinal numbers generalise counting in N 
and (apart from 1) the elements do not have inverses in N. Non-standard 
analysis generalises measuring in R, where multiplicative inverses do ex- 
ist. This is typical of what happens when we generalise familiar systems in 
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different ways. In mathematical analysis within the real numbers there are 
no infinitesimals, so the set-theoretic epsilon-delta method is appropriate. 
In fields larger than the reals, infinitesimals occur and they can be used to 
develop the theory of non-standard analysis. Meanwhile, applied mathemat- 
icians can work well with ‘arbitrarily small numerical quantities’ in a natural 
way. What matters is that the approach used is appropriate for the speciality 
concerned. 

The approach followed in this book is to offer a foundation for a full range 
of mathematical thinking, combining natural visual and symbolic methods 
with formal definitions and formal proof. It offers a preparation for various 
future developments, be they in pure mathematics with a focus on mathem- 
atical analysis, an alternative logical approach using infinitesimal methods, 
or a more pragmatic approach that justifies the intuition of engineers and 
physicists, clearly based on sound formal foundations. 


Exercises 


This chapter offers a broader picture of a formal approach to mathematics. 
Given the broad sweep of ideas, we do not set a list of specific examples to 
practise at this time. Far more important is to reflect and consider how your 
ideas are progressing. A useful exercise is to re-read the opening chapter and 
look at the notes you may have kept on the exercises in that chapter. How are 
your views changing? 

Look through this chapter again and write some notes to help you think 
through the advantages and disadvantages of a formal approach to mathem- 
atics. Don’t just take our word for it, explain to yourself the reasons in favour 
of a formal approach, so that you grasp how the formal approach works for 
you and make explicit any problems that may concern you. Share your in- 
sights and concerns with others so that you can make better sense of the 
reasons for a formal approach. Then you can use these insights to help make 
sense of the ideas in the next part of the book. 
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CHAPTER 13 


Permutations and Groups 


n this chapter we develop some basic aspects of group theory to illus- 

trate how axiomatic systems can be used to generalise features found in 

concrete mathematical systems. Groups are absolutely fundamental to 
modern mathematics. They originated from advanced areas of algebra and 
geometry, but the underlying concept turned out to be very simple, though 
sophisticated. 

We begin with a practical example: the notion of a permutation. This is a 
way to rearrange the elements of a set X. For example, if X = {1,2, 3} then 
we might rearrange the order 1, 2, 3 to 1, 3, 2. We formulate this concept as 
a bijection o : X — X. The set of all permutations of a fixed set X has several 
pleasant algebraic properties, and from these and other examples we derive 
a short list of axioms to define the formal notion of a group. We then prove 
some basic theorems about groups, including a structure theorem showing 
that every group can be considered as a group of permutations. This the- 
orem tells us that a group is not merely an abstract concept: there are ways 
to imagine groups visually and to manipulate their elements symbolically. 
We can then build up new insights in the theory that involve both formal 
proof of theorems and also natural ways of making sense of their structure. 
These ideas will arise in a range of different situations in later mathematical 
courses, so it is worth gaining experience of their general properties. 


Permutations 


In everyday life, we often find ourselves arranging some set of objects in 
different ways, or choosing one arrangement out of many possibilities. For 
example, several guests are coming to dinner and we have to decide who sits 
where. Or we're playing a game of cards and start by shuffling the pack. Ini- 
tially, mathematicians thought of a permutation as one such arrangement. 
For example, if the objects were the symbols x, y, z, then the permutations 
were ordered triples like (z, y, x) or (y, x, z). There are six such triples: 
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(xyz) (Zy) xz) (zx) (zxy) (zyx). 


However, mathematicians now focus not on the order of the numbers, but 
on the way one arrangement is changed to get another one. For example, 
when the order is reversed, so that the triple (x, y, z) is changed into (z, y, x), 
the symbol in position 1 is initially x, but ends up as z. Similarly the symbol 
in position 2 is initially y, and ends up as y, while the symbol in position 3 is 
initially z, but ends up as x. This change can be described by a function 


o :{1,2,3} > {1,2,3} 
defined by 
o(1)=3, o(2)=2, o(3)=1. 


What matters here is the numbers that give the positions of the symbols 
in the list. The symbols themselves tell us how to change these numbers. 
The modern approach is more elegant, leading to a simple and precise 
definition: 


Definition 13.1: A permutation of a set X is a bijection o :X —> X. 


When X is finite (and in practice not too large) there is a useful notation 
for permutations, which rewrites a list of rules like o(3) = 1 in a compact 


form: 
1 2 3 
e i a (13.1) 


The top row lists the elements of X. Beneath each element x is its image o (x) 
under o. 

In this notation, the elements of X can be listed in any order in the top 
row. Provided we stick to the rule that o (x) is written underneath x, chan- 
ging the order of the elements makes no difference to o. For example, the 
permutation in (13.1) could be written as 


2 1 3 
232 17 
which conveys the same information. The possibility of changing the order in 


this manner, which we will shortly see is very useful, is one of the reasons why 
focusing on some specific ordering of the elements of X is not a good idea. 
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First we note that the composition’ of functions ø ot is defined to be 
oo0t(x) = o(t(x)), which means, first do t and then o. 
Permutations of a given set X have three basic properties: 


Theorem 13.2: 


(1) The identity ix is a permutation of X. 

(2) Every permutation o of X has an inverse o~ 
tation of X. 

(3) Ifo and t are permutations of X, then so is their composition o o T. 


1, which is also a permu- 


Proof: 


(1) The identity is obviously a bijection, as remarked in chapter 5 when it 
was defined. 

(2) Since ø is a bijection, it has an inverse by theorem 5.17(c) of chapter 5. 
Clearly o™ is also a bijection. 

(3) This follows from proposition 5.20 of chapter 5. 


When X is finite, we can calculate the inverse of a permutation and the 
composition of two permutations using the notation introduced above. For 
example, suppose that X = {1, 2,3, 4,5, 6, 7} and 


ge( lt 2345 67 
“\76543 21/3’ 
7-(1234567 
“\5731462/S) 


To find t~', we just swap the two rows: 


oie 53614472 
“\1234567/) 


If necessary, we can rearrange the columns so that the numbers in the first 
row are in numerical order 1-7. Or we can observe that the number lying 
above 1 is 4, the number lying above 2 is 7, the number lying above 3 is 2, 
and so on. Either way, we get the equivalent expression 


oo 1234567 
“\4725 13 6/)° 


1 Algebraists sometimes write ø (x) the other way round as (x)ø so that o followed by 
t is written as (x)ot = ((x)o)t. This makes the rule for composition read more naturally, 
so that ot means ‘first do ø, then t’. However, the notation o(x) is much more widely 
accepted, so we accept the minor irritation of composing permutations from right to left. 
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To calculate the composition of functions oot we first do t and then o. 
This involves running through the numbers x = 1, x = 2,...,x = 7, working 
out what t(x) is, and then what happens to that number when we apply o. 
For example, 

t(1) = 5 then o(5) = 3 
t(2)=3... of(3)=5 
T(3)=6... o(6)=2 
t(4)=1... o(1)=7 
t(5)=4... o(4)=4 
t(6)=7... o(7)=1 
T(7)=2... o(2)=6 


Sor 123 45 6 7 
oT = A 
3.527 4 1 6 


Another way to see how to get this result is to rewrite o so that the top row 
is listed in the same order as the bottom row of q, like this: 


Therefore 


pu(lL 23s 4567 
“\5 36147 2/ 
-5361472 
-13527416 


and then combine the top row of t with the bottom row of ø, which gives 
the same result. After a little practice you can do this by writing down the top 
row in order 1, 2, 3, . . . and trace your finger along o to find what is beneath 
each successive t(x), to write this down on the bottom line. 


Permutations as Cycles 


A permutation can be written in a more compact form by tracing where each 
element goes. For example, in the case of the permutation 


ee 123 45 67 
~\5 7314 62 
we have 1 goes to 5, 5 goes to 4, 4 goes to 1. At the same time 2 goes to 7 


and 7 goes to 2, while both 3 and 6 remain unchanged. We can represent the 
transformation as 
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13> 5 — 4—> 1,2 — 7 > 2,3 > 3, and 6 > 6. 


Each of these short lists determines a permutation in its own right, and is 
called a cycle. The first is written as (1 5 4) on the understanding that each 
number in the cycle goes to the next and the last number returns to the first. 
The product of these cycles is then written as 


(154 (27) (3) (6). 


Because cycles with only one element effectively do nothing, they can be 
omitted to write the product as 


(154) (27). 


Remembering that composition reads from right to left, this notation oper- 
ating on an element x means first see what happens in the cycle (2 7) then in 
the cycle (1 5 4). For instance, operating on 4, the cycle (2 7) doesn’t change 
it, but the cycle (1 5 4) takes 4 to 1. 

In this case the cycles are disjoint; that is, they have no elements in com- 
mon. In this case, the order does not matter. However, if two cycles have an 
element in common, then the order does matter. The product 


(1 2)(2 3) 


operating on the element 2 first takes 2 to 3 in the cycle (2 3) and then 3 is 
unchanged by the cycle (1 2), so 2 goes to 3 overall. But the product 


(2 3)(1 2) 


operating on the element 2 takes 2 to 1 in the cycle (1 2), then 1 is unchanged 
in the cycle (2 3), so 2 goes 1 overall. 

This means that the product of two permutations o, t need not be 
commutative, so we may have oot # Too. 

On the other hand, it is easy to write down the inverse of a cycle. It operates 
a cycle in reverse order. For example the inverse of (1 5 4) is (45 1). Check 
this by working it through for yourself. 


Group Properties for Permutations 


Theorem 13.2 established three basic properties of the set of all permutations 
of any set X, which we write as follows: 


Definition 13.3: A set G of permutations of a set X is a permutation group 
or has the group property if: 
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(PG1) The identity ix € G. 
(PG2) Ifo € Gtheno ! eG. 
(PG3) Ifo, t € Gthenoot €G. 


Classically, X was always taken to be a finite set, and in this case the first 
two properties are consequences of the third (exercise 12). This is why the 
phrase ‘the group property’ came to be used. 

This definition applies to the set of all permutations on X, which we denote 
by Sx. When X = {1,2,3,...,} we use the simpler notation S,,. We can then 
restate theorem 13.2 as: 


Theorem 13.4: For any X, the set Sy is a permutation group. 


We have worded the definition ofa permutation group on X carefully, so that 
it does not have to be the whole set of permutations on X. For instance, if we 
take the subset {ix, s} of the set S; of all permutations of {1, 2, 3, 4, 5, 6, 7}, 
where s = (2 3), we find that (2 3)(2 3) is the identity, so s™ = s. In fact, the 
set consisting of ix and s satisfies all of the properties in the definition 13.3, so 
it is itself a permutation group. 

The following subsets of S; are also permutation groups where i denotes 
the identity: 


You can check that these satisfy definition 13.3 by hand. 

To be able to operate fluently with permutation groups, it may be useful 
to imagine them visually in a way that enables you to grasp the structure. 
For example, we might visualise S; by permuting the three corners of an 
equilateral triangle ABC. Initially we mark the three vertices as lying at po- 
sitions 1, 2, 3. 


Position 1 


Position 3 # = Position 2 


Fig. 13.1 Permuting the corners of an equilateral triangle 
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Then we pick up the triangle and rotate it or turn it over to place the 
corners in different positions to give six possible symmetries: 


the identity i, which leaves the triangle unchanged. 
Two are rotational symmetries: 


clockwise rotation p by a third of a full turn is the permutation (1 2 3), 
anticlockwise rotation À by a third of a turn is the permutation (1 3 2). 


Three are mirror symmetries: 


flip u over the line of symmetry through A, given by (2 3), 
flip ug over the line of symmetry through B, given by (3 1), 
flip uc over the line of symmetry through C, given by (1 2). 


However, by performing various combinations of a rotation and a mirror im- 
age, we can obtain all of these symmetries. For instance, using combinations 
of the rotation p and the mirror image u, we can obtain: 


the identity i 

the rotation p as (1 2 3) 

the rotation A as (1 3 2) or p? 

the flip u as (2 3) 

the flip ug as (1 3), which can also be written as up or p71 
the flip uc as (1 2) or pu = up’. 


You should all check these statements for yourself. Either write the permu- 
tations as cycles and carry out the composition symbolically as explained 
above, or cut out an equilateral triangle from paper or card and physically 
carry out the sequences of rotations and flips. 

This calculation shows that all six elements of the group S; may be written 
in the form pP u1 where 0 < p < 2and0 < q < 1. In fact, all we need are 
the expressions 


i, P, P”, Hy Ph, Ph. 


This observation simplifies computations with the elements of S3. We can 
think of them as being ‘generated’ by the two elements p, jz subject to the 
‘relations’ 


p =i? =i, up = pu. 


In general we can simplify any product of powers of p, u to these six distinct 
elements by using these relations. For instance, if we have a product such as 


pup 
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then we can write it as 
pup) = pP? u) = Pu = u. 


Not only do we have up = p* 4, we also have u? p = pu. So manipulating the 
symbols in this case is easy. The commutative law does not hold in general, 
so we cannot change the order of the terms in a product, but we can pass a 
term p over a term u provided we replace p by p° when we do it. In this way 
we can reduce any product of powers of p and u to the form pP u1 where 
0<p<2and0<q<1. 

This rule of thumb only works in this particular group, but it is fruitful 
to think of various other groups in terms of generators and relations. For 
instance, the group of symmetries of a regular polygon with n sides is gener- 
ated by two symmetries, a rotational symmetry p shifting one corner round 
to the next position and a mirror symmetry u flipping the polygon over an 
axis of symmetry through one of the corners. 


Position 1: .----P 


oe A ays 
Position n Position 2 


Position 3 


<---> 
u 


Fig. 13.2 Symmetries of a regular polygon 


This group is generated by p and y subject to the relations 
fai =i pu = up". 


In other groups, with many generators and relations, calculations like 
these can become very complicated. It then becomes essential to give a for- 
mal definition of the general concept of group, and to build up theorems 
about its structure. 


Axioms for a Group 
The three conditions that define a permutation group prove to be good can- 
didates for a more general mathematical structure. However, this generality 


requires us to get rid of the condition that the elements being operated on 
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are permutations. Indeed, we do not even require them to be functions. 
Moreover, the operation concerned need not be composition. 

Historically, many examples arose in various areas of algebra, complex 
analysis, geometry, and topology that essentially satisfied properties like 
those in the definition of a group of permutations. 

For example, the set Z of integers has the following properties: 


Ifn e Zthen0+n=n. 
Ifn € Z then n + (-n) = 0. 
Ifm,n e Zthenm+ne Z. 


The first states that the analogue of the identity element is the number 0, be- 
cause adding 0 maps n to itself. The second similarly states that the ‘inverse’ 
of n is -n. And the third is like the condition on composing two permu- 
tations or rigid motions, except that now we add the numbers instead of 
composing them. 

The set of rational numbers has similar properties with respect to addition. 
Moreover, the set Q\ {0} of non-zero rational numbers has similar properties 
with respect to multiplication: 


Ifr e Q\{0} then Ir =r. 
If r e Q\{0} then r(1/r) = 1. 
If r,s € Q\{0} then rs € Q\{0}. 


Now the ‘identity’ is 1, the inverse of r is its reciprocal 1/r (hence the restric- 
tion to non-zero numbers), and we use multiplication instead of compos- 
ition. We encountered these features of Q in chapter 9. 

These examples are the tip of a gigantic iceberg. During the early part of 
the twentieth century it became clear that it was pointless to keep proving 
the same theorems over and over again in many different contexts, especially 
since the proofs were often identical. The whole topic was crying out for an 
axiomatic approach, which would bring all of these different systems under 
one heading and define concepts and prove theorems with as much general- 
ity as possible. The history is complicated, but the final outcome is amazingly 
simple. 

Less obvious—or perhaps too obvious, because it happens by default in all 
of the above examples—is the associative law. This law holds for composition 
of functions (chapter 5, proposition 5.14), and addition, and multiplication 
(see chapter 9). 

One part of the group property—the one that got the whole subject 
started—is so basic that it is now incorporated directly into the definition 
of a group. This is ‘closure’ under the operation: the idea that when you 
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compose permutations, or add integers, or multiply non-zero rationals, you 
get another object of the same kind. Recall from chapter 5 that a bin- 
ary operation on a set A is a function f:A x A— A. As we stated when 
defining a binary operation, examples include composition, addition, and 
multiplication. 


Definition 13.5: A group is a set G together with a binary operation * on 
G, satisfying the following conditions: 


(G1) There exists an identity: an element 1g € G such that for all g € G 


leo*g=8, g*lg=g 


(G2) For all g € G there exists an inverse element g` € G such that 


g*g' = leg’ *g= lG 
(G3) The operation x is associative: for all g, h, k € G 


(gxh)xk=g%x»(hxk). 


If we want to be really formal, we can write the group as a pair (G, *) to 
make it clear which binary operation is being considered. 


Examples 13.6: All the following are groups: 


e the set of all permutations Sx on a set X with the binary operation o 

e the set Z with the binary operation + 

e the set Q with the binary operation + 

e the set Q\ {0} with the binary operation x 

e the real numbers R under (that is, with respect to the operation of) 
addition 

e the complex numbers C under addition 

the integers Z, modulo n under addition 

e the non-zero real numbers R\{0} under multiplication 

e the non-zero complex numbers C\{0} under multiplication 

e the non-zero integers Z,\{0} modulo p under multiplication when p is 
prime. 


Only the last of these needs any effort to check and we leave this as an 
exercise. 


Many of these examples satisfy the commutative law g x h = hx g and are 
given a special name: 
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Definition 13.7: If a group G satisfying (G1)-(G3) of definition 13.5 also 
satisfies the commutative law: 


g*xh= hxg forallg,h,k € G 


then G is an abelian group (named after the mathematician Niels Hendrik 
Abel). 


The definition of a group is clearer if we introduce an unambiguous and 
general notation, such as «, for the binary operation. However, its continued 
use is clumsy, and we normally replace f x g by fg unless there is a serious 
danger of this being confused with some other meaning of the product. We 
also replace 1g by 1, which usually causes no problem when the operation 
is thought of as a ‘product’, although we realise that in some of the above 
examples 1 is an identity map, in others it may be 1 and in others it is 0. In 
contexts where the main groups that arise are all abelian, it is common to 
use + for the binary operation and 0 for the identity element. We reserve the 
right to use whatever notation is appropriate in any given context. 

We do not require the set G to be finite. It is when G = S, for finite n, but 
Z and Q are infinite. 

In the early stages of an axiomatic theory, quite a lot of effort has to be 
expended to sort out basic book-keeping issues: making sure that results that 
seem obvious are actually true. The commutative law is an instructive ex- 
ample, because it is often false. So any deduction that tacitly makes use of the 
commutative law must be viewed with suspicion unless there is another way 
to get the same result, or the group is known to be abelian. For example, our 
algebraic instincts could easily lead us to write 


(fg) PE 


for two elements f, g of some group G. But this equation is not always 
correct—as can be seen by working out the operations (ou)? and p7p in 
the group G = S3. 

Our next task is to sort out a number of useful properties that are true. We 
collect them in one jumbo package: 


Theorem 13.8: Let G bea group. Then: 


(1) The identity element is unique. That is, if fg = g for all g € G, or just 
for one such g, then f = 1. The same goes if gf = g. 
(2) The inverse of any element of G is unique. 


(3) Iff, g € G then (fg)! = gft. 
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(4) General associative law: If brackets are inserted into any product 
£12 - -- Sn SO that it makes sense, the result is always the same. We can 
therefore omit the brackets and write this unique value as 91)... gn. 

(5) General commutative law: If the elements g1, 82, . - -, g&n € G commute, 
that is, gig; = gig; for all 1 < i, j < n, then the product gig)... gn has 
the same value if the elements are permuted in any way. 


Proof: We prove parts (1)-(3) and outline inductive proofs for (4)-(5), with 
discussion and examples. 


(1) Iffg = g then 
fafa le as =a =1: 


Similarly if gf = g. 
Suppose that gh = 1. Then 


gi=g'l=g1(gh) =(g'g)h = lh=h, 


and similarly if hg = 1. 
For all f and g, 


(gf) = (FUR) = oF fg) = 8 (lg) = gig = 1. 
Now use the uniqueness of inverses to conclude that gf = (fg)'. 
We already know that we can write the product of three terms f, g, 
h as fgh (without brackets) by the associative law. Suppose that we 
can write the product of n terms without brackets for some n > 3. 
Then, if we have a product of n + 1 terms g1, 82,- . -> 8n Zn+1» it may 
either be of the form (g1 . . . £1) n+1, where the product of the n terms 
£1» ++ -> gn is the same whatever the position of the brackets, or it is of 
the form (g1 ...g+)(@r41--- Sn+1) Where r < n. Since each bracket has 
fewer than n terms, it is independent of the order of bracketing and 
we may write 


(2 


w~ 


we 


3 


S 


(4 


B= Bik 
h = Sra. ++ Bn 
k= u+ 


and use the associative law g * (h * k) = (g x h) * k to write 


(gi. ++ 8r 8r + o Sns1) = (Bi -+ -8n)8n 


so the associative law holds for n + 1 terms and, by induction, it holds 
forall n > 3. 

Again, the general case can be formulated for elements g1, 82, . - - , gn 
in any order, and proved using induction on n > 2. As the elements 


wa 


(5 
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commute in pairs, this is true for n = 2. For the induction step, use a 
series of swaps to move g; to the front; then the induction hypothesis 
shows that the rest can be arranged in the order gg; ...g, and the 
proof is complete. 


From now on we use all of the above facts without further comment as part 
of the context of building the theory of groups. They are basic reflexes for 
every group-theorist. If you’re wondering why we’ve bothered with, say, the 
general associative law, find out what calculations look like when the associa- 
tive law is false. Look up ‘non-associative algebra’ on the internet, or borrow 
a suitable book. We can tell you the main point now: it gets very complicated. 
With enough motivation, you can learn to love it, and some special kinds 
of non-associative operation are actually very useful, although they usually 
satisfy some weaker version of associativity. Non-associative algebra is an 
acquired taste. 


Subgroups 


Recall that we discovered that several subsets of S3 also form permutation 
groups under the same operation, which in this case is composition. This 
phenomenon is very common, so we give it a name: 


Definition 13.9: Let G be a group. A subset H C Gis a subgroup of G if: 


(1) lG eH. 
(2) Ifh € H then h! € H. 
(3) Ifh, k € H then hk € H. 


That is, H contains the identity and is closed under inverses and products. 
There is a more efficient way to verify that a subset is a subgroup: 


Theorem 13.10: A subset H C G is a subgroup if and only if H is non- 
empty and hk € H whenever h, k € H. 


Proof: Suppose that H is a subgroup. Then 1g € H so H is non-empty. 
Moreover, k € H so hk! € H. 

Conversely, suppose H is non-empty and hk! € H whenever h, k € H. 
Since H is non-empty there exists h € H. Set k = h: then 1g = hh’ € H. 
Then set h = 1g to get k! € H. Finally, observe that hk = h(k-!)!. 


Proposition 13.11: Suppose that H is a subgroup of G. Then H is a group 
under the operation * of G, restricted to H x H. It has the same identity 
element and inverses as G. 
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Proof: Check the axioms systematically and verify the properties required 
for identity and inverses. All are straightforward. 


One important way to obtain subgroups of G is to pick an element g and 
see what else the subgroup must contain. There’s 1, of course, but also 


g =g 

g? = ggg 
g! 
g” u gig! 
ge? = gig igt, 


and so on. This motivates the definition of powers g” of g, for any integer n 
(positive, negative, or zero): 


Definition 13.12: Let G be a group and g € G. For any n € Z define g” 
inductively by: 


Gye =1 


(2) pnt = gg” (n > 0) 
(3) g” =(g") (n < 0). 


We would be astonished if the following theorem were not true. Fortu- 
nately it is. 


Theorem 13.13: Let G be a group, g € G, and m, n € Z. Then gg” = 
gan 
Proof: Use induction on n. 


We introduce the notation 
(g) = {g"|n€ Z} 


because the set of all powers of g is always a subgroup: 


Theorem 13.14: Let G bea group and let g € G. Then (g) is a subgroup. 


Proof: Take any two elements g”, g” € (g). Then g(g")! = g™™" e (g). 
Now appeal to theorem 13.10. 


Definition 13.15: Let G be a group and g € G. We call (g) the subgroup 
generated by g. 
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Clearly any subgroup that contains g must contain (g), so (g) is the 
unique smallest subgroup that contains g. Moreover, it is commutative by 
theorem 13.13. 

If we throw in an extra element, not a power of g, nothing as simple can 
be proved. The possibilities are very complicated, except for commutative 
groups. 

What does (g) look like? Let’s try a few examples. Suppose G = S; and 
g = p. The powers of p are p° = i, p! = p, p? = pp. But p° = i, and from 
here on, the powers of p just cycle repeatedly through i, p, o”. Moreover, 
p~} = p° so negative powers provide nothing new. In short, in this case 


(p) = {i p, p°} 


This should not be a great surprise since we already know that {i, p, p°} is a 
subgroup, so it must contain all powers of p. What drives this phenomenon 
is the fact that p? = i. 

In contrast, suppose that G = Z under addition and g = 1. Then g” = n 
because the group operation is addition. Now all the ‘powers’ g” are distinct, 
and the subgroup generated by 1 is (1) = Z. This is an infinite group. 

A group (g) generated by a single element g is called a cyclic group. It 
consists of all the powers of g. Its structure is easy to classify: 


Proposition 13.16: A cyclic group (g) generated by a single element g is 
either finite with n distinct elements {1,¢,97,...,g” 1} where g” = 1, or it is 
infinite of the form {g" |n € Z} where g” 4 g" for m # n. 


Proof: Either there are two distinct values of m and n for which g” = g”, or 
all powers of g are distinct. In the first case we may take n < m.Ifk = m-n 
then g* = (g”)(g")! = 1. Now let n be the smallest power such that g” = 1. 
All powers g” for 0 < r < n must then be different, for if g" = g° for 0 < 
r< s <n, then g*” = 1, where s-r < n, contrary to n being the smallest 
power of g with this property. In this case we therefore have the cyclic group 
(g) with n distinct elements {1,g,g,...,.g”"} where g” = 1. 

On the other hand, if all the powers are distinct, then (g) is {g" |n € Z} 
where g” 4 g" form # n. 


lIsomorphisms and Homomorphisms 


Sometimes two technically different groups have essentially the same struc- 
ture. For example, the subgroups {i, (2 3)}, {i (3 1)} {i (3 2)} of S; all 
consist of two elements, the identity and a second element whose square 
is the identity. 
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In these examples, the groups concerned work in the same way, except for 
a change of notation. This preserves the operations in the sense that changing 
notation for two elements and then multiplying them has the same effect as 
multiplying them and then changing notation. This motivates the following 
definition: 


Definition 13.17: An isomorphism between two groups G, H is a bijection 
¢:G>H 


such that 


#( £182) = O( g1)O( g2) Yg g € G. 


If such a bijection ¢ exists, we say that G is isomorphic to H. Symbolically, we 
write G = H. 


If two groups are isomorphic, all of their abstract properties—those that 
do not depend on the notation—are the same. Moreover, corresponding 
elements have the same abstract properties. The next theorem lists some 
examples. 


Theorem 13.18: Let G, H be groups and suppose that there is an iso- 
morphism ¢ : G —> H. Then: 


(1) (lo) = 1y. 

(2) Ifg € G then $(g”) = ((g))”. 

(3) Ifg € G then (g) = (g). 

(4) IfK is a subgroup of G then ¢(K) is a subgroup of H. 


Proof: We leave the proofs, which are straightforward, as exercises. 


More generally, we can consider a map ġ : G —> H between groups which 
preserves the operation but need not be bijective. 


Definition 13.19: A homomorphism between two groups G, H is a map 
$ : G —> H such that 
(£182) = P(g1)O(g2) Yg 82 E G. 


If ¢ is injective, then it is a monomorphism. If @ is surjective then it is an 
epimorphism. 


For example, inclusion i : Z — Q from the integers under addition to 
the rationals under addition is a monomorphism. The map ¢@ : Z > Zp 
from the integers under addition to the integers modulo n under addition, 
mapping an integer to its remainder modulo n, is an epimorphism. 
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Monomorphisms and epimorphisms are important concepts in group the- 
ory. For example, a monomorphism @ : G > H is an isomorphism between 
G and the image $(G) = {¢(g) € H |g € G}. This lets us identify G with the 
subgroup (G) of H, because there is a bijection between the elements and a 
precise correspondence between the operations. 

Now our careful definition of a permutation group pays off. Recall that a 
permutation group on a set X was defined to be a set G of permutations of X 
satisfying: 


(1) the identity ix € G 
(2) ifo € G then o™ € G 
(3) ifo, t € Gtheno ot €G. 


This can now be seen to be a subgroup of the permutation group Sx. More 
generally, we can prove: 


Theorem 13.20 (Structure Theorem for a Group as a Group of 
Permutations): Every group is isomorphic to a permutation group. 


Proof: Let G be a group. For fixed but arbitrary g € G define 
wg: G—>G 
by 
T(x) = gx (x € G). 


Informally, this map is ‘left multiplication by g. 

This map is clearly injective: if t(x) = m¢(y) then gx = gy, so g-'gx = g'gy 
and x = y. It is also surjective: if y € G then, for x = gy, n(x) = g(g"'y) = y. 
So 7 is a bijection and therefore a permutation of the set G. 

Define the map ¢ : G > Sg by ¢(g) = mg. We claim that ¢ is a mono- 
morphism. 

The map ¢ is injective: if (g) = (h), then gx = hx, so gxx! = hx! 
and g =h. To show that it is a homomorphism, observe that @(hg) maps x to 
(hg)x, and 


(hg)x = h( gx) = m1(7¢(x)) for all x € G, 
so 


elhg) = o(h)p(g) 


and ¢ is a homomorphism. 

An injective homomorphism is a monomorphism from G to Sg. It 
is therefore an isomorphism from G to its image, the subgroup $(G) 
of Sg. By definition, this is an isomorphism between G and a permutation 


group. 
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This theorem shows us that any abstract group can be viewed as a per- 
mutation group. In particular, for finite groups, every group is isomorphic 
to a subgroup of S,,. In principle, this means that we can derive properties 
of groups from permutation groups, particularly for small values of n. For 
example, we can gain insight by exploring the properties of permutations as 
cycles. But in practice, for larger values of n, the possibilities become far more 
complicated, and for infinite groups even more possibilities occur. 


Partitioning a Group to Obtain a Quotient Group 


So far we have introduced the notion of subgroup: a subset of a group that 
is itself a group. It is also possible to define a group structure by clumping 
group elements together and defining an operation on the set of clumps. We 
have already seen this idea in action when defining Z,, the integers mod- 
ulo n. The clumps are equivalence classes of integers, where two integers are 
equivalent if they are congruent modulo n. To add two clumps, we choose 
an element from each, add those, and see which clump the result belongs 
to. This construction provides a second way to analyse a group in terms of 
simpler groups, and it turns out to be intimately related to homomorphisms. 

We formalise the clumps by working with a partition of a group. By ana- 
logy with integers modulo n, we take a partition P of a group G and try to use 
the group operation to define an operation on the equivalence classes of the 
partition P. However, this procedure can run into trouble, because different 
choices of elements from the clumps may lead to inconsistent results. The 
construction of Z, works because the clumps have a very regular structure: 
elements in the same clump differ by a multiple of n. If we try the same trick 
with a less regular partition, it may not work. 

For example, suppose we partition Z into {0, 1}, {2, 5, 6}, {3, 8}, and 
various other disjoint pieces. What should {0, 1} + {2, 5,6} be? If we choose 
representative elements 0 and 2, then 0 + 2 = 2 so the sum ought to be the 
clump {2, 5, 6}. But if we choose elements 1 and 2, then 1+ 2 = 3 so the sum 
ought to be the clump {3, 8}. Therefore ‘sum’ is not well defined in this case, 
and the attempt fails. 

Working with a general group G, we want to divide the set G into subsets 
that themselves operate as a group. When this is possible, the result is called 
a quotient group. The reason for this name will become clearer later. 

A partition P of G is a set of disjoint (non-empty) subsets of G, so that 
every element of G lies in precisely one of the subsets in the partition. 
These subsets are called equivalence classes, and theorem 4.9 of chapter 4 
provides a structure theorem for partitions: every partition corresponds to 
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an equivalence relation ~ in which a ~ b if and only if a, b belong to the same 
equivalence class. We denote this equivalence class by Ea, where Ea = Ep if 
and only if a ~ b. 

For this partition P to inherit a product from the group, we obtain the 
product ST of two equivalence classes S, T in P by taking elements x € S and 
y € T and defining ST to be the subset that contains xy. This may also be 
written by defining EE, to be the subset E,,. 


Fig. 13.3 Defining a product on a partition of a group 


To be able to do this independently of the choice of the individual elem- 
ents, we need to know that if we take other elements x’ € S and y' € T then 
the product x’y’ needs to be in the same subset ST as xy. Another way of say- 
ing the same thing is to assert that if x’ € E, and y' € E, thenx’y’ € Ey, or if 
x ~ xandy ~ y then xy’ ~ xy. Only then can the product of elements in 
the group be used to define a product on the equivalence classes as elements 
of the group P. 

If we can define a group structure on the partition, this must have the 
properties we have proved about groups in general. For instance, the iden- 
tity of any group is unique and the inverse of any element is unique. This 
immediately restricts how a group structure can be defined on a partition. 

For example, there is only one candidate for the identity. The equivalence 
class I contains the identity 1g, so I? must contain (1g)? = 1g which implies 
P = I. Therefore I must be the identity element. 


Fig. 13.4 The identity element for a partition 


We can prove more: 
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Theorem 13.21: Ifa partition P of a group G has a group structure in 
which the product of equivalence classes Ex and Ey is defined to be Exy, then 
the identity element of the partition I is the equivalence class containing 1g 
and I is a subgroup of G. 


Proof: If h, k € I, then 
Enk = EnEx = 1, 


so hk € I, so I is closed under the group operation. 
Also, for any g € G, 


E,Byt = Egi = I. 
If g € I then E, = I, so this reduces to 

IE, =I 
which implies that 


Ep =I. 


Therefore g"! € I. So I must be a subgroup of the whole group G. 


These conditions are therefore necessary to set up a group structure on P 
in the stated manner. However, they are still not sufficient. The other equiva- 
lence classes must also have a special structure. To state what it is we require 
a new construct: the notion of a coset of a subgroup. 


Definition 13.22: Let H be a subgroup of G and let x € G. Then 
the left coset of x is xH = {xh € G | h € H} 
and 


the right coset of x is Hx = {hx € G | h € H}. 


Proposition 13.23: Let G be a group, and let H be a subgroup of G. The 
left cosets of H partition the set G. The right cosets of H also partition 
the set G. 


Proof: First consider the left cosets {xH |x € G}. Each coset is non-empty, 
since it contains x1g = x. Every element g € G lies in at least one coset, 
namely gH. If two cosets xH, yH contain a common element g = xh, = yh, 
then x = yhjh{! = yh where h = h hī! € H, because H is a subgroup. 
Any element g € xH is therefore of the form g = xk where k € H, so 
g = xk = yhk. Because H is a subgroup, hk € H, giving g = yhk € yH. Thus 
xH C yH. 
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A corresponding argument shows that yH C hH, hence the left cosets xH 
and yH are the same. 
The proof for right cosets follows the same pattern. 


The Number of Elements in a Group 
and a Subgroup 


When we partition a group G into left and right cosets of a subgroup H, it is 
clear that the map between two (left) cosets xH and yH which maps xh to yh 
for all h € H is a bijection. If G is finite, then all the cosets will be the same 
size.” This means that G is subdivided into a number of equal-sized subsets, 
which leads to a direct relationship between the number of elements in G and 
the number of elements in H. 


Definition 13.24: The order of a finite group G is the number of elements 
in the group and is denoted by the symbol |G]. 


The use of the term ‘order’ should not be confused with its use in other 
contexts, such as an order relation or the order of elements in a permutation. 
It is part of the traditional theory of groups and you should just get used to it. 


Proposition 13.25: If H is a subgroup of a finite group G then the order 
of H divides the order of G. 


Proof: Let n = |H| be the order of H. Then every left coset has n elem- 
ents, and the cosets are disjoint subsets of the same size that include all the 
elements of G. If there are m distinct cosets, the number of elements in G is 
therefore mn. 


This result is very helpful when seeking subgroups of a given group. For 
a subset to be a subgroup, its order must divide the order of the group. 
For example, when considering the possible subgroups of the permutation 
group S3, which has six elements, the subgroups must have order 1, 2, 3, or 
6 and there are no others. So the subgroups must either be the identity (order 
1), the whole group (order 6), or subgroups of order 2 or 3, which have all 
been identified earlier. 

This proposition has an important consequence for elements of the 
group. Let H = (g), the cyclic subgroup generated by g. This is of the form 
{l,g,97,...,¢ 1}, where g” = 1 and all listed elements are distinct. Therefore 
the order of this cyclic subgroup is n. This leads to: 


? This property also holds for infinite groups, but to explain this requires us to define 
what we mean by ‘the number of elements in an infinite set’. We consider this in chapter 14. 
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Definition 13.26: An element g € G is said to be of finite order if there 
exists n € N such that g” = 1. The smallest such n is said to be the order of g. 


An immediate consequence is: 


Theorem 13.27: If g is an element of a finite group G then the order of g 
divides the order of G. 


Partitions that Define a Group Structure 
Now that we can partition a group G using left or right cosets, we ask when it 
is possible to define a group operation on the partition, using the operation 
xH x yH = xyH (13.2) 
for left cosets, or 
Hx * Hy = Hxy (13.3) 


for right cosets. We show that this is possible if and only if the left and right 
cosets are the same. In such a case we would have 


xH = Hx for all x € G, 


and the two rules (13.2) and (13.3) will give the same result. 
We make the following definition: 


Definition 13.28: A subgroup H of a group G is a normal subgroup if the 
left and right cosets gH and Hg are equal for all g € G. 


The condition gH = Hg does not mean that gh = hg for every h € G. It 
simply requires that gh = kg for some k € H. This means that the element 
k = ghg” lies in H, which gives rise to the following: 


Alternative Definition 13.29: A subgroup H of a group G is a normal 
subgroup if for every h € Hand g € G, the element ghg € H. 


Symbolically, if H is normal, we write H < G. 


Example 13.30: Consider our old friend S; and the two subgroups 
H = {i, u}, K = {i p, p°} 


where the mirror symmetry u satisfies u? = i and the rotation p satisfies 
3 . 
P =i. 
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Case 1: the subgroup H = {i, u}. 

Here the left coset pH is {pi, pu} = {p, wp}. But the right coset Hp is 
{ip, up} = {p, up}. These cosets are not equal. If we take the six elements 
of S3 and partition them into left and right cosets of H, then we get the 
following: 


Left cosets of H: Right cosets of H: 
EN En y? 2H in KAN ye 
-P 
“Hp 
also... uH up?H pH also... Hy Hyp’ Hup 


Fig. 13.5 Different left and right cosets 


Using the relations wp = p°u and pu = up’, we see that oH 4 Hp 
and p?H # Hp’ so the subgroup H is not normal. If we try to define the 
product of two left cosets, say H and pH, by selecting an element in each 
and multiplying them together, the results may lie in different cosets. For 
instance, if we select i € H and p € pH, then their product ip = p € pH, 
but if we choose u € H, and p € pH, then their product up € p?H. 

On the other hand, subgroup K is different: 

Case 2: the subgroup K = {i, p, p°}. 

The left coset pK is 


pK = {pi, pp, pp”} = {p, ps ih 
and the right coset Kp is 
Kp = {ip, pp, p’ p} = {p, p°, i}. 


Using the element u instead, the left and right cosets are still the same. The 
left coset uK is 


uK = {ui, up, Up}; 
and the right coset Ku 
Ku = {in, pu, u} = {u, up”, mp}. 


In this case, the partition of S; has two equivalence classes, K and uK, which 
can be written in different ways as follows: 
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uK =p K =up° K = Kyu = Kup = Kup? 
Fig. 13.6 Identical left and right cosets 


Now H is a normal subgroup, and the set of cosets forms a group with 
elements K and uK where K is the identity element and (WK)? = K. 


This is true in general for normal subgroups: 


Theorem 13.31: If G is a group and N is a normal subgroup, then the 
partition P consisting of the subset N and the cosets gN for all g € G forms a 
group under the product 


gN hN = ghN. 


Proof: It is essential in this proof to build on the formal definition of a group 
and to ascertain that the operations are all well defined. 

First, suppose that xN = xN and yN = y'N, then x’ = xh, y' = yk for h, 
k € N. So 


x'y’ = xhyk = x(yy"')hyk = xy(y"'hy)k = xyn where n = (y'hy)k. 
Because N is normal, y~'hy € N, and because k € N and N is a subgroup, 
n= (ythy)k EN. 
Therefore 
xy = xyn € xyN 


and the cosets x’y/N and xyN are the same. 

The remainder of the proof is simple. The identity element is N and the 
inverse of xN is xN. Associativity for multiplication of equivalence classes 
follows from associativity in G. 


Now we can see why this is called a quotient group. This theorem shows 
that for any normal subgroup N of G we can partition the group G into its co- 
sets and define a group structure on them. The partition is denoted by G/N. 
These cosets are all the same size, in the sense that there is a one-to-one cor- 
respondence between any two of them. In particular, if G is a finite group of 
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order |G], then each coset has the same number of elements as the order |N| 
of the normal subgroup. The order of G/N is therefore 


|G/N| = |G\/|NI. 


Theorem 13.31 tells us that if N is a normal subgroup of a group G, then the 
group operation on G naturally leads to a group structure on the quotient 
group G/N. We can say something stronger: that this is the only way that a 
partition of G can be given a group structure inherited from that of G. 

To understand why, we introduce a more general notation for multiplying 
any two subsets X, Y of a group G. (We do not assume any further properties; 
they need not be subgroups, for example.) The product is 


XY = {xy e G|x € X,y € Y}. 
For example, in S3 
to, uHi p°} = {pi, ni, pp”, up’°}. 
Similarly, if X C G and g € G, we define 
X? = {x e G|xe X} 
8X = {g}X = {gx € G|x € X} 
Xg = X {8} = {xg € G|x € X}. 
Multiplication of subsets is obviously associative, and the general associative 
law applies. Therefore, if g, h € G then gNh is defined unambiguously (as 


either g(Nh) or (gN)h, which are equal). If N is a normal subgroup, we can 
now write: 


(gN)(AN) = g(Nh)N = g(hN)N = ghN? = ghN. 
Now it is evident why multiplication of cosets works for a normal sub- 
group N. Multiplication of elements in the group may not be commutative, 


but multiplication of any element g € G by N does commute. So the 
operation on cosets given by 


(gN)(AN) = ghN 


is well defined. 
This leads to the major structure theorem for quotient groups and normal 
subgroups: 


Theorem 13.32 (Structure Theorem for a Partition of a Group): 
If Gis a group and P is a partition of the underlying set G, then P is a group 
with the operation inherited from G if and only if N is a normal subgroup 
and P is the quotient group G/N. 
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Proof: Theorem 13.31 proves that if N is a normal subgroup, then the 
partition G/N is a group under the operation (gN)(hN) = ghN. 

Conversely, we have shown above that whenever a partition P inherits a 
group structure, the identity element in P must be a normal subgroup, and 
the other equivalence classes must be its left (and right) cosets. 


The Structure of Group Homomorphisms 


We now prove a structure theorem for group homomorphisms, relating 
them to normal subgroups. Suppose that ¢ : G —> H is a homomorph- 
ism. A homomorphism need not be injective, and it need not be surjective. 
Not being injective makes homomorphisms worth studying, because this can 
partition a complicated group G into simpler pieces. But not being surjective 
makes very little difference, as the image 


im(9) = {6(g) |g € G} 
(which we previously denoted by ¢(G)) is simply a subgroup of H: 


Proposition 13.33: If G, H are groups and ¢ : G —> H is a homomorph- 

ism, then im(@) is a subgroup of H. 

Proof: If g, h € G then, by theorem 13.18(3), (ht) = (@(h))"!, so 
PLIE = PLEH) = o(gh') € im(4). 


By theorem 13.10, im(@) is a subgroup. 


The homomorphism ¢ also gives rise to a special subgroup in G: 
Definition 13.34: Let ø : G —> H bea homomorphism. The kernel of ¢ is 
ker($) = {g € G | @(g) = 1n}. 


We can then prove: 


Theorem 13.35: Let : G —> H beahomomorphism. Then the kernel of 
ġ is anormal subgroup of G. 
Proof: Ifh € ker(@), then ¢(h) = 1y, so for any g € G, 

o(ghg ') = $(g)(h)b(g"') = AA) = ADA = Ie. 
Therefore ghg € ker(ġ), and ker(@) is a normal subgroup. 


This leads immediately to: 
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Theorem 13.36 (Structure Theorem for Group Homomorph- 
isms): Let G and H be groups and ¢ : G — H be a homomorphism. 
Then 


G/ker(¢) = im(@). 


Proof: Let N = ker(@), which is a normal subgroup of G. Then G/N con- 
sists of the left cosets gN for g € G, and the group operation is setwise 
multiplication. Define the map u : G/N — im(@) by 


u(gN) = (2). 


This is certainly well defined, for if gN=hN then g=hn for n € I, so 
(n) = ly and 


pe) = (hn) = (hln) = O(A)1y = GCA). 
u is a homomorphism because 
LN YN) = b(xy) = AHY) = WEN) KON). 
It is injective because given u(gN) = u(hN), then o(g) = (h), so 
Pgh) = ADE) = HDE = 1u. 
So g'h € N, implying g™hN = N, so gN = hN and u is injective. 
It is also surjective, because, given any k € im(@), then k = $(g) for some 
g € Gand so 
k = $(g) = (gN). 


Hence u is an isomorphism. 


Example 13.37: For the additive group of integers Z, the set 
nZ = {nm € Z|m € Z} of all multiples of n is a subgroup of Z under add- 
ition. Here the operation is addition and the cosets should be written as 
k + nZ. For instance, if n = 3, then the cosets are 


3Z ={...,-6,-3, 0,3,6,...} 
1+3Z={...,-5,-2,1,4,7,...} 
2+3Z={...,-4,-1,2,5,8,...}. 

For n > 1, we have 
ZInZ = Zn. 


For n = 0, we have OZ = {0} and Z/0Z = Z. For negative n, we have 
nZ = (-n)Z. 
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The Structure of Groups 


Now we have structure theorems that enable us to think about groups not 
just as a list of axioms and subsequent theorems but as a crystalline concept 
that we can imagine in our minds. As we saw in theorem 13.20, a group G 
is precisely a group of permutations of a set of objects. In particular, it is a 
subgroup of the permutation group Sg permuting the underlying set G. 

If we attempt to take a partition of a group G into subsets and define a 
group structure on the partition, then this can be done if, and only if, one of 
the subsets in the partition is a normal subgroup K of G and the other subsets 
in the partition are cosets of K. 

If we have a homomorphism (a function preserving the group operation) 
$ : G — H from a group G to another group H, then ker(@) (the elements 
in G mapping onto the identity in H) is a normal subgroup of G, the image 
im(@) is a subgroup of H, and the quotient group G/ker(@) is isomorphic to 
im(@). 

In general, a group G formulated in terms of a set-theoretic definition may 
be seen as a group of permutations on a set X. 

We saw earlier that the permutation group S; can be considered as the 
group of symmetries of an equilateral triangle, where p is a rotation through 
an angle 27/3, u is a mirror reflection in a line of symmetry through one of 
the vertices, and the permutations are of the form p? u1 where and 0 < p < 2, 
0<q<l. 

In the same way, other geometric figures have a group of symmetries. For 
instance, a square has eight symmetries: four rotations (one being the iden- 
tity) and four reflections. A regular n-gon has 2n symmetries; a circle has 
infinitely many. 

The theory of groups can be used to formulate properties of symmetries, 
particularly in geometry. Historically, however, the first developments in 
group theory arose in algebra in the early nineteenth century. In the next sec- 
tion we consider the broader evolution of mathematical ideas as part of an 
overall vision without being encumbered by the step-by-step detail of that 
development. This is intended to give you an overall picture of the use of 
group theory in courses you may encounter later in your studies. 


Major Contributions of Group Theory throughout 
Mathematics 


The abstract notion of a group developed from groups of permutations, 
which were first made explicit by Evariste Galois in connection with 
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solutions of polynomial equations using algebraic formulas. By this we mean 
formulas that are constructed from the coefficients using addition, subtrac- 
tion, multiplication, and division, but also pth roots for integers p. (We may 
assume p is prime here, since, for example, ¥/a = \/,/a and so on.) 

Niels Henrik Abel had proved that no such formula exists for the quintic 
equation, but Galois placed his result in a more general context: when can a 
polynomial equation be solved by a formula, and when not? His answer was 
that there is a group G of permutations associated with any such equation— 
basically, those permutations of its roots that preserve all algebraic relations 
between them—and there is a formula if and only if this group has a very 
special kind of structure. Namely, G has a sequence of subgroups 


G=G@) 2G, 2 G 2G; 2... 2 & ={I} 


where each G;,; is a normal subgroup of G; and the quotient group G;/Gj+ı 
has prime order. Roughly speaking, each piece corresponds to part of the 
formula that takes the pth root of something, where p is the prime concerned. 

Galois observed in particular that when the equation is a general quintic, 
the group G consists of all permutations of the five roots, so it is Ss. This 
has order 120, and it has only three normal subgroups: S; itself, the trivial 
subgroup {1}, and a subgroup called As with order 60. Since 120 is not prime, 
we have to start the sequence with G; = As. But Galois also observed that 
the only normal subgroups of A; are As and {1}. (A normal subgroup of a 
normal subgroup of G need not be a normal subgroup of G, so this also needs 
proof.) Since neither 1 nor 60 is prime, the sequence gets stuck at As; that 
is, there is no sequence of the required kind. Therefore the quintic can’t be 
solved by a formula. 

A similar algebraic technique resolved the classic problems of whether one 
could duplicate a cube or trisect an angle in Euclidean geometry. The answer 
is a resounding ‘No! Using algebra to interpret the intersections of lines and 
circles essentially involves finding the solution of successive quadratic equa- 
tions which each have two solutions, and the permutation group of each 
successive quadratic equation involves groups of order 2, 27, and so on. But 
duplicating a cube with sides of length 1 involves constructing a cube side x 
whose volume satisfies x? = 2. This equation has 3 complex roots and the 
corresponding group of permutations is of order 3, not a power of 2. So it is 
not possible to duplicate the cube in Euclidean geometry. 

If there were a technique for trisecting an angle, then we could apply it to 
trisect 30°, which would lead to the construction of the angle 0 = 10° and, 
in particular, to the value of x= sin 10°. But using the formulae for sin 30 
it can be shown that x= sin@ satisfies a cubic equation whose solutions 
have a permutation group of order 3, not a power of 2. Again, an algebraic 
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proof using permutations solved a geometric problem that had puzzled 
mathematicians for 2000 years. 

It took about 40 years after Galois’ death in a duel for his ideas to be prop- 
erly appreciated. For instance, Klein realised their implications in geometry. 
In 1872, he published a unified framework (called the Erlanger Programme) 
for classifying different forms of geometry. For two millennia, geometry had 
focused on Euclidean geometry in the plane and in three-dimensional space, 
using concepts such as congruent triangles and the parallel property, but over 
time, new forms of geometry had occurred. 

During the Renaissance, painters had developed the idea of representing 
scenes on a canvas by imagining they were looking through a glass with their 
eye in a fixed position and painting the scene on the canvas to represent what 
they saw. This gave rise to projective geometry, projecting three dimensions 
onto a two-dimensional plane. The picture could be transformed by moving 
the position of the eye to look at the scene from a different viewpoint. Un- 
der such a transformation, points remained points, straight lines remained 
straight, but angles could change and circles could be transformed into ellip- 
ses. From Klein’s viewpoint, points and straight lines were invariant concepts 
in projective geometry but angles and circles were not. 

Klein realised that different forms of geometry could be described by gen- 
eralising the algebraic language of permutations introduced by Galois, with 
each form of geometry operating on a set and focusing on properties that 
remained invariant under the transformations available in the theory. 

The notion of symmetry could now be interpreted in a more general sense 
as a bijection on a set that preserves some specified kind of structure. It could 
be a shape (rigid motions), it could be an algebraic formula (Galois group), 
it could be a property like ‘being a solution of a specific differential equa- 
tion’. In this way, groups are ‘really’ about symmetry and offer powerful new 
principles in mathematics (See [33]). 

For instance, we have already seen that the group of permutations S3 
can be represented as the group of symmetries of an equilateral triangle, 
consisting of a rotation p through a third of a full turn, a flip u over a 
line of symmetry through a vertex, together with combinations of these 
permutations where p° = u? is the identity and up = pL. 

We have also already seen that if we replace the equilateral triangle by 
other subsets of the plane, we obtain similar results: a square has eight sym- 
metries (four rotations including the identity and four reflections), a regular 
n-gon has 2n symmetries, a circle has infinitely many. 

These ideas can be further generalised to the entire plane. For instance, 
a tiling of the plane by congruent squares, like an infinite chessboard, has 
infinitely many translational, rotational, and reflectional symmetries. 
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This leads to a new way of viewing Euclidean geometry as the study of 
rigid motions of the plane that preserve the distance between points. These 
include translations that shift all points in a fixed direction, rotations through 
an angle about a fixed point, and reflections, which flip the plane over a fixed 
line to produce a mirror image. We can show that all rigid motions of the 
plane arise by combinations of translations, rotations, and reflections. 

Formally, the rigid motion of the plane can be described as a map 
f : R? > R? where the distance between two points x and y is the same as 
the distance between f(x) and f(y). This means that if we pick three non- 
collinear points A, B, C forming a triangle then the transformed triangle 
A'B'C’ will have the lengths of the sides maintained: AB = A'B’, BC = B'C', 
CA = C’A’. We can show that any rigid motion can then be constructed by 
using a translation, a rotation, and a reflection, as follows. 

First translate the plane so that A moves to A’. Then, because AB and A’B’ 
are the same length, rotate the plane around A’ so that the rotated line co- 
incides with A'B’. At this stage the rotated triangle may coincide with the 
triangle A’B’C’, or it may be a reflection in the line A’B’. The rigid motion 
of the plane is then seen as a successive application of a translation, then a 
rotation, and, if needed, a reflection. 


B’ b B 
C B C C, 
B 
Cc 
A’ C A A A 
A 


start translate ABC rotate ABC about A reflect ABC in AB 
(moving A to A) (to align AB, A'B’) (if necessary) 


Fig. 13.7 A rigid transformation as a translation, rotation, and perhaps a reflection 
Formally, the successive transformations can be written as follows. The 
first translation takes any point (x, y) to (x + a, y + b) and can be written as 
Tia (xy) = (x +a, y+ b), 
or more compactly as 
T,(z) =z+u, (13.4) 


where z = (x, y) is a general point in the plane and u = (a, b) is the specific 
vector representing the translation. 

The same expression can also be interpreted as addition of complex 
numbers where z = x + iy and u = a+ ib. 
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The rotation Ry around a fixed point v = (c,d) through an angle œ can 
be expressed in Cartesian coordinates. However, it has a more compact ex- 
pression using complex numbers, where multiplying by e” turns a complex 
number through the angle a to give: 


Rya(z) = vt (zZ- ve, (13.5) 


Finally, reflecting the plane in the horizontal axis can be expressed simply 
in complex numbers by taking z = x + iy to Z = x - iy. To perform a general 
reflection of a point z in a line that is at an angle £ to the horizontal through a 
point v = (c, d) is more sophisticated, but it can be done in successive moves, 
first by moving z to z - v (to move the point z to the origin), then turn the 
plane through an angle -£8 to move the line horizontal to z - v to (z - v)e"¥, 
then flipping the plane over the horizontal axis to (z - v)e"#® = (z - v) e-f = 
(z - v) €f, and finally, turning the plane back through an angle +f to return 
the line to its original position. This gives the final mirror position of the 
original point z as 


M, g(z) = (z-v) eB lB = (z - y) e”, (13.6) 


The full transformation to shift the triangle ABC to the position A’B’C’ 
can therefore be performed by a translation T,, followed by a rotation Ry, 
and then, ifit is necessary to flip the triangle over, to perform M,,,. Any rigid 
motion of the plane can be written as a composite function taking z to 


(M, p)" o Rya 0 T,(z) forv € R?,0 <a < 27,0 <p <2randk=0o0r1. 


This shows that translations, rotations, and mirror images generate the 
whole of the group of rigid transformations. To complete the descrip- 
tion of the group we need to identify the relations between these rigid 
transformations. These include relationships such as: 


To =i 
Ty 0 Ty = Tuy 
Ryo =i 
Ryo o Ryg = Rv o+d 
(M, =i 
and all possible pairwise combinations of T,,, Ry, My,g for different values 
of u,v,w € R*,a,B € R. The details include the possible combinations 


of translations, rotations, reflections in different directions, rotations round 
different points, or reflections in different lines. They could be generalised 
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to rigid motions in R’, and even move on to higher dimensions in R”. 
Such activities take us beyond the goals of a focus on the foundations of 
mathematics and are postponed for possible study in later courses. 


The Way Ahead 


The theory of groups developed in this chapter can be extended to apply in 
many areas of mathematics using the notion of symmetry as a bijection of 
a set that preserves certain kinds of structure. A symmetry could relate to a 
shape (rigid motion), to an algebraic formula (Galois group), it could be a 
property like ‘being a solution of a specific differential equation’. 

In many applications of mathematics, the symmetries of a system tell us 
a lot about the system itself. For example, the symmetries of a drum impose 
constraints on its vibrational frequencies, and the symmetries of a growing 
organism affect the shapes it can take up. Deep areas of physics turned out to 
be governed by the symmetries of the basic equations of relativity and quan- 
tum mechanics. Modern particle physics, up to and including the recently 
discovered Higgs boson, builds on the study of such symmetries. 

Pure mathematics also benefits from generalising the ideas of this chapter 
to more general algebraic structures. Algebraic structures may have sev- 
eral operations, some of which have a group structure relating to theorems 
proved here. For example, rings, fields, and vector spaces include a commu- 
tative operation of addition and this may interact with other operations such 
as multiplication in a ring or field or operations by scalar quantities in vector 
spaces. 

In all these cases, the additive structure is commutative, so additive sub- 
groups are normal and have additive quotient structures, such as Z/nZ = Zp. 
In this particular case, multiplication also works in Z, to make it a ring 
(and also a field if p is prime). In this example, we not only have nZ closed 
under multiplication, we can also multiply any element nk € nZ by any 
element m € Z to get the product mnk, also in nZ. This turns out to 
be the fundamental property for introducing quotient structures in ring 
theory. 


Definition 13.38: If R is a ring and I is a subgroup under addition then I 
is an ideal if x € R, y € I implies xy € I? 


3 Here, as elsewhere in this book, when we don’t say otherwise, we speak of a ring with 
commutative multiplication. In a non-commutative ring, we would require both xy and yx 
to belong to I. 
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An ideal is not only closed under multiplication of elements within it, but 
also by multiplication by an element from the whole ring. 
Combining the properties of addition and multiplication, we have 


Equivalent Definition 13.39: If R is a ring and I is a non-empty subset 
of R, then I is an ideal if 


(i) x,y € I implies x - y € I 
(ii) x € R, y € I implies xy € I. 


Example 13.40: nZ is an ideal in Z. 


In ring theory, it is possible to define quotient structures R/I for a ring R 
and an ideal J in R using the same techniques as for a group and a normal 
subgroup. For example, if I is an ideal in the ring R, then because addition is 
commutative, the cosets written additively as x + I or I + x are equal and, by 
the structure theorem for groups, addition may be defined on R/I by 


(x+I)+(y+I)=(x+y)+I (13.7) 


so that R/I is an additive group. 
We can also define multiplication by 


(x+ (y +I) = xy +I. (13.8) 


Theorem 13.41: IfR is a ring and IJ is an ideal in R, then R/I is a ring 
where addition and multiplication are given by (13.7) and (13.8). 


Proof: Because R is a commutative group under addition, R/I is already 
known to be a commutative group under addition. We need to check that 
multiplication is well defined and satisfies the associative, commutative, and 
distributive laws. These are all straightforward. 

Ifx+I=x +Iandy+I = y +I, then 


xy- xy =xy-xy+xy- xy =(x-x)y +x (y- y). 
By the definition of an ideal, 
x-x ely EeER>(x-x)y €I, x €R, (y-y)eI>x(y-y)eIr. 


Hence xy - xy € I and so xy +I = xy +I. 
Thus multiplication is well defined. The multiplicative identity is 1 + I 
and the associative, commutative, and distributive laws follow from the 
corresponding properties in R. 
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Example 13.42: Z, = Z/nZ is a ring and if p is prime, then it is a field. 
You should verify these properties for yourself. 


This use of quotient rings will prove to be insightful in chapter 15. 


Exercises 


1. Express the following permutations in disjoint cycle form: 
(i) 123 45 
32514 
123456 
(ii) 
654321 


123456789 
(iii) 
597413628 


(iv) (12)(12)(145)(23). 
2. Express the following permutations in standard form: 
(i) (1234)(23)(12) 
Gi) (1235)(43) 
(iii) (43)(34)(123) 
(iv) (12)(13)(12)(143)(2). 


3. Find o~! in both standard and disjoint cycle form where ø is: 


, (1234 
G) (; 4 3 `) 
Gi) (1234)(56) 
Gii) (12)(12)(12)(13)(14). 


4. Calculate the product ot and to, using the convention that permu- 
tations are written on the left (i.e. øt is t followed by ø): 


@oe(12345)\,-(12345 
“\423 15/7  \3 4152 
(ii) o = (123), t = (23) 
ae (123 4 
(iii) o = (1234), t = ( i kas | 
5. Prove that if X is a finite set with n elements, then the number of 
bijections from X to itself is n! = n(n - 1)(n - 2) . . . 3.2.1. 
Hint: Prove by induction the slightly more general theorem that 


if X, Y are finite sets, each having n elements, then the number of 
bijections from X to Y is n!. 
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6. Write out the axioms for a group. Which of the following sets is a 


10. 


11. 


12. 


13. 
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group under the given operation? If it is, specify the identity element 
and the inverse of each element; if not, give one reason why it fails. 
(i) Z under addition 

(ii) Z under multiplication 

(iii) R under addition 

(iv) R under multiplication 

(v) R*, the positive real numbers, under multiplication 

(vi) Rs, the integers modulo 5 under addition 
(vii) Rs, the integers modulo 5 under multiplication. 


. Show that the set {15, 25, 35, 45} forms a group under multiplication 


modulo 5. Show that {1g, 3g, 5g, 78} forms a group under multiplica- 
tion modulo 8. Find the largest subset of the integers modulo 12 that 
is a group under multiplication modulo 12. In each case, write out the 
multiplication table. 


. The set {13, 23} of non-zero integers modulo 3 form a group under 


multiplication modulo 3, but the set {14, 24, 34} of non-zero integers 
modulo 4 do not form a group under multiplication modulo 4. Ex- 
plain why, and investigate what happens to the set {1,,,2,,...,("-1)n} 
of non-zero integers modulo n. 


. Show that Zž, the set of non-zero elements modulo 7, is in the form 


{l7,a,a,...,a°} for a = 37. Deduce that, for any integer n, either 
n° = 0 mod 7, or n° = 1 mod 7. 


Show that for any 1 < k < n, the elements 1, ky, 2nky,...,(N-1)nky are 
all different. Hence, or otherwise, prove that the non-zero elements 
in Z, form a group if and only if n is a prime. 

Find all subgroups of S4. 

Prove that the complex nth roots of unity satisfying œ” = 1 forma 
cyclic group of order n under multiplication. Relate this fact to the 
rotational symmetries of a regular n-gon. 


Let S be a non-empty set of permutations of a finite set X satisfying 
the closure property: 
Ifo,t € Stheno oT ES. 
Prove that if X is finite then the following properties also hold: 
The identity iy € S. 
Ifo € Gtheno™! e s. 


Hence deduce that a non-empty set S of permutations of a finite set X 
satisfying the closure property is a group. 
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14. 


15. 


16. 
17. 


18. 


19. 


Let A be a finite set and P be the set of subsets of A. Let the operation 
A on P be the symmetric difference: XAY = {x € XUY |x € XAY}. 
Show that P is a group under the operation A, with identity element 
Ø. What is the inverse of X? What happens if A = Ø? 

If a, b are any two elements of a group, show that a'b! = (ba). 
Hence, or otherwise, show that if G is a group such that x? is the iden- 
tity for every x € G, then G is abelian (i.e. that ab = ba for all a, 
beG). 

Prove that H is a subgroup of Gif and only if H 4 @ and HH” = H. 
Find an example to show that if a subgroup H is not normal then the 
product of two cosets gH and kH of H need not be a coset of H. 
Suppose that M, N < G and M is a subgroup of N. Prove that M < N 
and (G/M)/(N/M) = G/N. 

Hint: Prove that the composition of two homomorphisms is a 
homomorphism, and then consider the corresponding quotient 
groups. 

Using complex numbers, define the rotation pw around the origin 
turning through an angle a moving z = x + iy to py(z) = ez. 

Define the mirror image ug in a line through the origin making an 
angle 6 with the horizontal axis by g(z) = e2iPz, 

Prove that these definitions agree with the ones given in chapter 11, 
and prove the following properties: 


Ro =i 
Ro o Rọ = Rosg 
Mo(x, y) = (x, -y) 
Ro o Mọ = Roser 
Mg o Ro = Reais 
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CHAPTER 14 


Cardinal Numbers 


hat is infinity? 
W ‘ives some first-year university students were asked this ques- 


tion, the consensus answer was ‘something bigger than any 
natural number’. In a precise sense, this is correct; one of the triumphs of 
set theory is that the concept of infinity can be given a clear interpretation. 
However, there is a surprise: when we compare the sizes of sets, we find not 
one infinity, but many—a vast hierarchy of infinities. This discovery came 
about by reformulating the question. Instead of asking ‘how many’ elements 
there are in a given set and using counting, it is much more profitable to 
compare two sets, and ask if there are as many elements in one of them as 
there are in the other. This idea can be made precise by saying that sets A 
and B have ‘the same number of elements’ if there is a bijection f : A — B. 
Rather than beginning with the full hierarchy of infinities, let’s begin with 
what turns out to be the smallest. Here the standard set, for comparison 
purposes, is the natural numbers N. It is useful to consider N rather than 
No = NU{0} because a bijection f : N —> B organises the elements of B into a 
sequence; we can call f(1) the first element of B using this bijection, f(2) the 
second, and so on. Using this process we set up a method for counting B. Of 
course, if we actually say the elements one after another using this bijection, 
‘f (1), f(2),...; we never reach the end, but we do know that if b € B then 
b = f(n) for some n € N, so we reach that particular element eventually. 
Recall from chapter 8 that we defined N(0) = Ø, and for n € N, 


N(n) = {mE N|1<m <n}. 


Definition 14.1: A set X is finite if there exists a bijection f : N(n) > X 
for some n € No. A set X is countable if either X is finite or there exists a 
bijection f : N — X. If there is a bijection f : N(n) —> X, then we say that X 
has n elements. 
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Finding such a bijection for a finite set is just the usual process of counting. 
Why not generalise to infinite sets too? We can get started: 


Definition 14.2: If there is a bijection f : N —> X then X has Xo elements. 
We say that X is countably infinite. 


The symbol N is the first letter ‘aleph’ of the Hebrew alphabet, and No is our 
first example of a new concept of number, used to state how big an infinite 
set is. If there is a bijection between N and X, it makes sense to say that ‘X has 
the same (cardinal) number of elements as N.’ That number is given a new 
symbol, Xo. 

Before discussing cardinal numbers in general, we take a closer look at the 
notion of countability. 


Example 14.3: No is countable. Define f : N —> No by f(n) = n- 1; 
then f is a bijection. This is the first fascinating property of this method of 
‘counting infinite sets’. N is a proper subset of No, so intuitively it should have 
fewer elements, yet in the sense of a bijection between the sets, they have the 
same size. 


Galileo gave an even more graphic example in 1638: 


Example 14.4 (Galileo): There is a correspondence between the natural 
numbers and the perfect squares: 


123 4 gee Haas 
t444 4 
14916...n... 


In modern set-theoretic terms, if S = {n? € N|n € N}, the mapf : N > S 
given by f(n) = n? is a bijection. 


This result is very curious, because we get the squares from N by remov- 
ing all of the numbers that are not square. There are infinitely many of 
these; moreover, squares get thinner on the ground as we progress to larger 
numbers. Intuitively, a ‘random’ natural number is probably not a perfect 
square. 

For over two centuries, this seeming contradiction blighted any attempt 
to contemplate infinity in a precise sense. Leibniz went as far as to suggest 
that we should only ever consider finite sets—that the apparent contradiction 
arose because the natural numbers are infinite. His resolution of the conflict 
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was that if we consider only finite sets of natural numbers—say the numbers 
less than 100—there is no correspondence between these natural numbers 
and those of them that are squares. Indeed, of the 100 numbers in this range 
exactly 11 are squares. 

This is a bit retrictive. It rules out any sensible concept of ‘number’ for 
infinite sets. Georg Cantor realised that we can do better. His solution of the 
paradox in the 1870s was even more dramatic. He showed that if we interpret 
‘as many’ to mean that there is a bijection between two sets, then any infinite 
set has ‘as many’ elements as a proper subset! Here ‘infinite’ is interpreted 
in the technical sense that B is infinite if there is no bijection f : N(n) > B 
for any n € No. From Cantor’s point of view, there is no paradox; just a 
counterintuitive theorem. As we have said many times: when you generalise 
a mathematical concept, some of its original properties may no longer be 
true. 


Proposition 14.5 (Cantor): Ifa set B is infinite, then there exists a proper 
subset A C B and a bijection f : B —> A. 
# 


Proof: First, choose a countably infinite subset X of B. Since no bijection 
exists between N(0) and B, B is non-empty and there exists some element in 
B which we call x). Define g : N —> B inductively by g(1) = x1, and if distinct 


elements x), X2,...,X, have been found, then since g cannot give a bijection 
g : N(n) — B, there must be another element, which we name x,4; € B, that 
is distinct from x1, . . . , Xn. Define g(n + 1) = Xn41. Let 


X = {x, € B|n eN}. 
Let A = B\{x;}, define f : B > A by 
Fn) = Xn for x, E X 
and 


fb) =b forb¢X. 


Then f is a bijection. 


We can do better than this. We can start with an infinite set B and re- 
move an infinite subset to leave a subset C with a bijection from B to C. For 
example, if we take the set N of natural numbers, then the sets E of even 
numbers and O of odd numbers allow us to define bijections 
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f :N — E where f(n) = 2n 
g:N > O where g(n)=2n-1. 


If we start with the infinite set N and remove the infinite subset O then this 
leaves the infinite subset E and a bijection f : N > E. 

More generally, for any infinite set B, we can remove an infinite subset and 
still be left with a subset A with a bijection f : B —> A. To do this, choose a 
countably infinite subset X of B as in the proof of proposition 14.5. Let Y be 
the subset {x„|n is odd} and let A be the subset of B with the elements of Y 
removed: 


A= B\Y. 
Define f : B > A by 
Ff (Xn) = Xan for x, € Y, and f(x) =x for x ¢ Y. 


Then f is a bijection which maps all of B onto A. 
We can therefore start with any infinite set B, remove a (countably) infinite 
subset Y and still be left with a subset A which has “as many elements’ as B! 


Cantor’s Cardinal Numbers 


Cantor’s solution to the problem ‘how many elements?’ for infinite sets was 
to introduce the concept of a cardinal number. For the moment we assume 
that for every set X, there is a concept, more briefly called a cardinal, with 
the property that if there is a bijection f :X — Y, then X and Y have the 
same cardinal, and if there is no bijection, then the cardinals concerned are 
different. We denote the cardinal of X by |X|. 

We haven’t yet said what cardinals are, just what they do. To place them 
ona firm basis, we have to construct them set-theoretically. Cantor didn’t get 
that far, and neither will we. However, it can be done. 

In the case of finite sets, a convenient candidate for the cardinal number 
is close at hand. If there is a bijection f : N(n) —> X, the cardinal of X is n. 
Likewise, given a bijection f : N —> X, the cardinal of X is Xo. For other in- 
finite sets, we may have to invent new symbols for their cardinals. In general 
we denote the cardinal of X by |X|, on the understanding that if there is a 
bijection f : X — Y, then |X| = |Y|. If there exists an injection f :X > Y, 
we say that |X| < |Y]. As usual, we define |X| < |Y| to mean |X| < |Y| and 
IXI # IYI. 
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In general, if X is a subset of Y, then the inclusioni: X —> Y,i(x) = x, is 
an injection, so we have 


XCY= |X| <1YI. 


Proposition 14.5 says that for any infinite set B there exists a proper subset A 
such that |A| = |B|. Thus for infinite sets, 


Xç Ye IXI <I¥1. 


The dilemma posed by Galileo’s example is not so much mathematical as 
psychological. When we extend the system of natural numbers and counting 
to embrace infinite cardinals, the larger system need not have all of the prop- 
erties of the smaller one. However, familiarity with the smaller system leads 
us to expect certain properties, and we can become confused when the pieces 
don’t seem to fit. Insecurity arose when the square of a complex number vio- 
lated the real number principle that all squares are positive. This was resolved 
when we realised that the complex numbers cannot be ordered in the same 
way as their subset of reals. Likewise we resolve the seeming contradiction 
that Galileo discovered by realising that when we interpret ‘same cardinal’ in 
terms of a bijection between sets, proper inclusion of A in B does not prevent 
A and B from having the same infinite cardinal. 

We return to the notion of countability. Given any infinite set B, as in the 
proof of proposition 14.5, we can select a countably infinite subset X C B. 
This means that Nọ = |X| < |B|, so No is the smallest infinite cardinal. 
Surprisingly, many familiar sets that seem much bigger than N also have 
cardinality Xo. 


Example 14.6: The integers are countable. Define f : N > Z by: 


f(2n) =n, f(2n-1) =1-nforneN, 


then we get the bijection 


1234567 
Vi yi vd 
01-12-23-3.... 


Although f is a bijection, it doesn’t preserve the order (in the sense that 
m < n does not imply f (m) < f (n); for instance f(2) > f(3)). When we set 
up bijections between sets with an order on them, we may have to do it ina 
very higgledy-piggledy way, as the next example shows. 
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Example 14.7: The rationals are countable. 

We'll prove this in stages, first by counting the positive rationals. A positive 
rational is p/q, where p and q are natural numbers. One way of counting the 
rationals is to think of them written out as an array: 


1/1 1/2 1/3 1/4... 
2/1 2/2 2/3 2/4... 
3/1 3/2 3/3 3/4... 
4/1 4/2 4/3 4/4... 


Now read them off along the ‘cross diagonals’, first 1/1, next 1/2, 2/1, then 
1/3, 2/2, 3/1, and so on: 


Fig. 14.1 Counting the positive rationals 


This process strings the positive rationals out as a list 1/1, 1/2, 2/1, 1/3, 2/2, 
3/1, .... However, this list includes repeats, because 1/1 = 2/2 and later on 
we get 3/3, 4/4, and so on. Similarly 1/2 = 2/4 = 3/6 = .... That prevents 
the construction of a bijection. So we consider each element in the list in 
turn, and delete it if it has occurred before. That leaves 1/1, 1/2, 2/1, 1/3, 
3/1,.... Suppose that the nth rational in the remaining sequence is a,. Then 
the function f from the natural numbers to the positive rationals for which 
f(n) = ay is a bijection. Now we include negative rationals as well: the list 0, 
41, —41, A2,—A2,...,Ay,;—Ay,... includes every rational precisely once. So the 
map g : N —> Q given by 


g(1) = 0,g(2n) = ay,...,g(2n+1)=-a, forneN 


is a bijection, as required. 
Although we have not given an explicit formula for g(n), we have given an 
explicit prescription for it. The first few terms are 
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123456789 1011.. 
Vidi didi db vd 
0 1-1} -42 -25-i 3 -3.. 


and you should be able to continue as far as you wish. Later we develop a 
more powerful result, the Schröder-Bernstein theorem, which lets us prove 
that two sets have the same cardinal without constructing an explicit bi- 
jection. By invoking the theorem, we can deal with the rationals more 
cleanly. 


The reason why we allowed ‘countable’ to include ‘finite’ as well as 
‘countably infinite’ is the next result: 


Proposition 14.8: A subset of a countable set is countable. 


Proof: Given a bijection f : N > A and B C A, either B is finite, or we can 
define g : N > B by 


g(1)is the least m such that f(m) € B, 
having found g(1), . . ., g(n), then 
g(n + 1) is the least m such that f(m) € B\{g(1), .... g(n)}. 
Informally, this just amounts to writing out the elements of A as a list 
FQ) f(2), f Ohea ihar 


deleting those terms not in B, and leaving the terms in B listed in the same 
order. 


The remarkable fact about countable sets is that we can build up sets from 
them that seem a lot bigger, but once more are countable, in the following 
precise sense: 


Proposition 14.9: A countable union of countable sets is countable. 


Proof: Given a countable collection of sets, we can use N as the index set and 
write the sets as {An}nen. (If there is only a finite number of sets, A1, . . . , Ak 
put A, = Ø for n > k.) Since each A, is countable, we can write the elements 
of A, as a list An, Am» - - -> Anp - - » Which terminates if A, is finite but is an 
infinite sequence if A, is countably infinite. Now tabulate the elements of 


U A, as a rectangular array, and read them off along the cross diagonals as 
neN 
in the previous example: 
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Fig. 14.2 Counting down successive cross diagonals 


There may be gaps in the array because some of the sets are finite (as in 
row three of the above illustration) or because there is only a finite number of 
sets. There may be repeats when two sets An, Am have elements in common, 
so an element in row n is repeated in row m. We just pass over the gaps and 
delete elements that have occurred earlier. The list is then either finite, or an 


infinite sequence with no repeats. This shows that J A, is countable. 
neN 


Proposition 14.10: The cartesian product of two countable sets is 
countable. 


Proof: If A and B are countable, write the elements of A as a sequence 
@\,42,...,@y,... (which terminates if A is finite). Similarly, write the elem- 
ents of B as b,,b2,...,bm,.... Now write the elements of A x B as a 
rectangular array and read them off along the cross diagonals: 


Fig. 14.3 Counting ordered pairs 
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If either A or B is finite, there are gaps that should be passed over; if 
both are finite, then A x B is finite. If both are infinite, an explicit bijec- 
tion f : N —> A x Bis not hard to write down. There is one element in the 
first cross diagonal, two in the second, and, in general, n in the nth. So there 
arel+2+---+n= in(n + 1) elements in the first n. The rth element in 
the next cross diagonal is (a+, bn+2-r), so an explicit formula for the bijection 
f:N-> Ax Bis 


f(m) = (ar, bn+2-r) for m = in(n+ lt+r(<r<nt+l). 


An instance of proposition 14.10 in action is: 


Example 14.11: The set of points in the plane with rational coordinates is 
countable. 


At this stage of the game, the reader may be forgiven for thinking that 
every infinite set is countable, but that is not so, as we see by looking at the 
real numbers. 


Example 14.12: The real numbers are not countable. We prove this by 
contradiction, by showing that no map f : N > R can be surjective, so there 
cannot be a bijection f : N —> R. Given a map f : N — R, express each 
f(m) € Rasa decimal expansion, 


FOR) = am ` Am am, -- - Am, oo (Am © Z, am, = No 0 < am, < 9) 


where, for definiteness, if the decimal terminates, we write it that way, ending 
in a sequence of zeros, not a sequence of nines. Now we write down a real 
number, different from all the f (m). Let 


B =0- by bo...by... 


where 


n 


_ J lifa, =0 
~ | 0 ifa, 40. 


Then £ is different from f(n) because it differs in the nth place. We have 
avoided the possible ambiguity that might arise from an infinite sequence 
of nines in the expansion, by making sure that the expansion of 6 doesn’t 
have any. 
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Let N be the cardinal number of R. Since N C R we have Xo < N, and the 
last example shows that Xo # NÑ. So at last we have found a cardinal strictly 
bigger than No. 

In fact, for any cardinal we can find a strictly bigger one. The cardinal must 
be associated with some set A. We show that the power set of A always has 
strictly bigger cardinality: 


Proposition 14.13: If A isa set then |P(A)| > |A]. 

Proof: Evidently the map f : A — P(A) given by f(a) = {a} is an injection, 
so |A| < |P(A)|. It remains to show that |A| # |P(A)|. To do so, we prove 
that no map f : A —> P(A) can be a surjection. For such a map, f(a) € P(A) 
for each a € A, so f(a) is a subset of A. We ask “does a belong to the subset 
f(a)? The answer is always ‘yes’ or ‘no’. We select those elements for which 
the answer is ‘no’ to get the subset 


B={aeAlaé€ f(a)}. 


We claim that B is not mapped onto by any element of A under the function 
f. For if B were equal to f(a) for some a € A, the question ‘does a belong to 
B? leads to a contradiction: 

a € B > a ¢ f(a) = B, 

a ¢ B >a € f(a) = B. 


So B is not mapped onto by f and f is not surjective. Even more so, it cannot 
be a bijection. 


Proposition 14.13 leads us to a hierarchy of infinities. We begin with No = 
INI . Then |P (N)| is strictly bigger, then |P (P (N))|, and so on. 


The Schréder—Bernstein Theorem 


An obvious question to ask concerning the relation < between cardinals is 
if|A| < |B| and |B| < |A|, can we conclude that |A| = |B|? 


The answer to this question is in the affirmative, and the content of this state- 
ment is the Schréder-Bernstein theorem. The proof is trickier than might 
seem necessary for such a simple-looking proposition. The main problem is 
that |A| < |B| tells us that there is some injection f : A —> B, and |B| < |A| 
tells us that there is some injection g : B — A, but these injections need not 
be related in any useful way. Nevertheless, somehow we must use them to 
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construct a bijection between A and B. This requires some ingenuity, and it 
took a while to find the proof. 


Theorem 14.14 (Schréder—Bernstein): Given sets A, B, then |A| < |B| 
and |B| < |A| implies |A| = |B]. 

Proof: We have injections f : A > B,g: B — A. We can use f to pass from 
A to B or g to pass from B to A. Repeating the process, we can pass to and fro 


obtaining f(a), ¢(f(a)) .f(g(f(a)))..--- 


Fig. 14.4 Tracing a chain forwards 


The key to the proof is to try to trace such a chain backwards. Start with 
b € Band see if there exists a € A such that f(a) = b; if such an a exists, it is 
unique. Then see if there is a b, € B such that g(b;) = a, then a; € A such 
that f(a,) = bı, attempting to build up a chain, b, a, b1, a),...,bn,a,, where 
f(r) = b, g(b;) = anı. In tracing back a chain of elements in this fashion, 
three things can happen: 


A B 
Í 
Fig. 14.5 Tracing a chain backwards 
(i) we reach ay € A and stop because there is no b* € B with 
g(b*) = an; 
(ii) we reach by € B and stop because no a* € A satisfies f (a*) = by; 
(iii) the process goes on forever. 
This partitions B into three sets: 
(1) Ba, the subset of elements in B whose ancestry originates in A, as in 
(i). 
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(2) Bg, the subset of elements in B whose ancestry originates in B, as in 
(ii). 

(3) Boo, the subset of elements in B whose ancestry can be traced back 
forever, as in (iii). 


Note that B4, Bg, Bo are disjoint and their union is B, so they do indeed 
give a partition. Similarly we can partition A into A4, Ap, Aœ whose ancestry 
originates in A, B, or goes back forever, respectively. 

It is easily seen that the restriction of f to A, gives a bijection f : Aa — Ba, 
the restriction of g to Bg gives a bijection g : Bg — Ap, and the restrictions 
of f, g both give bijections f : Ago —> Boo, g : Boo —> Aæ. Using the first two 
and one of the third, we concoct a bijection F : A — B by setting 


f(a) ifacA, 


Fa)=4g¢\(a) ifa € Ag 
f(a) ifaeAy 


Fig. 14.6 Where does tracing back end? 


This completes the proof. 


As an example of this theorem, we give an alternative proof that the ration- 
als are countable. The inclusion i : N —> Q shows that |N| < |Q], and since 
any rational can be written uniquely in its lowest terms as (-1)” p/q where 
n, p,q € N, by unique factorisation the function f : Q > N,f((-1)"p/q) = 
2”3P54, is an injection, so |Q| < |N|. 

A more interesting example shows that |P(N)| = N. An injection 
f : P(N) — R can be obtained by 


f(A) = 0 - aja2...an... 


where 


_JOifnA 
" \|lifneA. 


For each subset A C N, this gives a unique decimal expansion and f is an 
injection. 
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To get an injection g : R — P(N) requires a little more cunning. Instead 
of writing a real number as a decimal expansion, we express it as a bicimal,' 
which means that we write it as the limit of fractions of the form 


ao + ay/2 + a2/4 +--+ +a,/2” 


where dp is an integer and a, is 0 or 1 for n > 1. If we exclude such 
expressions concluding with an infinite sequence of 1s (the bicimal equiva- 
lent of the decimal problem involving an infinite sequence of 9s), then such a 
bicimal expansion is unique. Now express the integer do in binary notation as 


ag = (-1)” by sae bb; 


where m and the digits b1, b2,..., bp are all 0 or 1, then we have a unique 
bicimal expansion for each real number x in the form 


x = (-1)” by... bob) - ajar... ayn... 


where m and each digit b,,..., bk, a1, - . ., Am . . . is O or 1. For convenience, 
in this case write b, = 0 for n > k. Now write the terms out as a sequence in 
the order m, a), bo, a2, b2, . . . , An, bn» .... This is a sequence of Os and 1s and 
defines a unique subset A of N according to the rule 


r € A if and only if the rth term of the sequence is 1. 


In this way we obtain a function g : R —> P(N) by defining g(x) to be the 
subset A determined in this manner. This is an injection, and the Schröder- 
Bernstein theorem shows that |R| = |P(N)|. 


Cardinal Arithmetic 


Just as we can add, multiply, and take powers of finite cardinals, we can 
mimic the set-theoretic procedures involved and define corresponding op- 
erations on infinite cardinals. Some, but not all, of the properties of ordinary 
arithmetic carry over to all cardinals, and it is most instructive to see which 
ones. First of all the definitions: 


Definition 14.15: The operations on cardinal numbers are as follows: 

Addition: Given two cardinals a, £ (finite or infinite), select disjoint sets A, 
B such that |A| = œ, |B| = £. (This can always be done. If A and B are not 
disjoint, replace them by A’ = A x {0} and B’ = B x {1}. Obvious bijections 


! Classical scholars will be horrified, but the word seems unavoidable because of its 
connotations with ‘binary’ and ‘decimal’. 
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show that |A’| = |B’|, and it is clear that A’ and B’ are disjoint.) Define a + £ 
to be the cardinal of A U B. 

Multiplication: If a = |A|, B = |B|, then wB = |A x BI. 

Powers: If a = |A|,B = |B|, then gf = |a| where A? is the set of all 
functions from B to A. 


You should check that when the sets concerned are finite, these definitions 
correspond to standard arithmetic. In particular, when |A| = m and |B| = n, 
then on defining a function f : B —> A, each element b € B has m possible 
choices of image, giving m” functions in all. Addition and multiplication are 
quite easy in the finite case. 

Notice that the sets in the definition of addition have to be disjoint, but 
this is not necessary for the other two operations. For addition, the reason 
is that if |A| = m, |B| = n, and A N B # Ø, then |AUB| < m+n. The 
most important fact to check about these definitions is that they are well 
defined. Starting with cardinals œ, 8, we must choose sets A, B with |A| = 
a, |B| = B: it is essential to check that if different sets A’, B’ were used, then 
the cardinal found in each case would be the same as before. In the case of 
multiplication, for instance, if |A| = |A’ ‚|B| = |B , then there are bijections 
f:A— A',g :B — B’, which induce a bijection 


h:AxBoA’ xB 


given by 

h(a, b) = (f(a), g(b)). 
Thus |A x B| = |A’ x B' |, and the product cardinal is well defined. There are 
corresponding proofs for addition and powers of cardinals. 


If we investigate the properties of these arithmetic operations, we find that 
many properties of finite numbers continue to hold for cardinals: 


Proposition 14.16: Ifa, 8, y are cardinals (finite or infinite), then 


(i) a+B=Brao, 

Gi) (@+B)+y=at+(Bry), 
(iii) a+0=a, 

(iv) aß = Ba, 

(v) (@B)y = a(By) 

(vi) la =a, 

(vii) a(B+y)=aB+ay, 
(viii) aft” = afa”, 

(ix) oY = (af), 

(x) (@B)” = a’ BY. 
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Proof: Let A, B, C be (disjoint) sets with cardinals a, $, y, respectively. 0 is 
the cardinal of Ø and 1 is the cardinal of any one-element set, say {0}. 
(i)-(iii) follow trivially because AU B = BU A, (AUB) UC = AU (BU C), 
and AU Ø =A. 

(iv)-(vi) follow because there are obvious bijections f : A x B > B x A given 
by f((a, b)) = (b,a), g : (A x B) x C — A x (B x C) given by g(((a, b), c)) = 
(a, (b, c)), and h : {0} x A > A given by h((0, a)) = a. 

(vii) results from the equality A x (BU C) = (A x B) U (A x C). 

If the last three seem harder, it is because we are less familiar with the set 
of functions A® from B to A. It is enough to set up the appropriate bijections. 
(viii) Define f : APYC —> A? x A® by starting with a map ọ : BUC > A, 
defining ¢, : B — A to be the restriction of ¢ to B, @2 : C > A to be the 
restriction of ¢ to B, then put f(#) = (¢1, %2) . This function f is a bijection. 
(ix) Define g : AP*C —> (AP)C by starting with a function ¢ : B x C > A, 
then defining the function g(@) : C > A? by [g(#)] (c) : B > A as the 
function that takes b € B to 


CLs) (b) = 4 (Cb, c)). 


As this is less familiar, it is worth demonstrating that g is a bijection. It is 
injective, for if g(@) = g (y) for two maps ¢, w from B x C to A, then 


([g()](c))(b) = (Egb) forall b € B,c € C 
so, by definition, 
o((b,c)) = Y((b,c)) forallb € B,c €C, 
which means that ¢ = y. 
To show that g is surjective, start with a function 0 € (A?)®. That is, 
0 : C —> A®.Then define ¢ : B x C > A by 
(b,c) = [0(c)](b) forallb € B,c € C. 
We have g () = 9, as required. 
(x) The final equality between cardinals follows from the bijection 


h : (A x B)© —> AF x B® given by writing any ¢ : C > A x Bin terms of 
8 y 8 any 


olc) = (Pilc),G2(c)) forc EC, 


and then setting h ($) = (¢1, %2) . Checking the details is left to you. 
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Now we perform some explicit calculations with cardinals. As a corollary of 
proposition 14.9, we find that 


n+ No = Xo +n = Xo for any finite cardinal n, 
No + No = No. 
This shows that there is no possibility of defining subtraction of cardinals 
where infinite cardinals are involved, for what would No - No be? According 


to the above results it could be any finite cardinal or Xp itself, so subtraction 
cannot be defined to ensure that 


No-No =a S No =NRota. 
From proposition 14.10 it is easy to deduce that 


nNo = Non = No forn eN, 
NoNo = No. 


It is interesting to calculate ONo. This turns out to be zero. In fact 
08 = 0 for each cardinal number £. 
This is because 
A = Ø > Ax B = for any other set B, 


for if A has no elements, then there are no ordered pairs (a, b) for a € A,b € 
B. This means that, in terms of cardinal numbers, zero times infinity is zero, 
no matter how big the infinite cardinal is. 

Likewise, it is instructive to calculate œ and @! for any cardinal a. By 
definition, if |A| = œ, then œ? is the cardinal number of the set of functions 
from @ to A. You might be forgiven for thinking that there are no functions 
from Ø to A, but the set-theoretic definition of a function f : Ø —> A as a 
subset of Ø x A exhibits just one such function, the empty subset of Ø x A. 
So g? = 1. Since |{0}| = 1, œ! is the cardinal number of the set of functions 
from {0} to A. A function f : {0} — A is uniquely determined by the element 
f(0) € A, so there is a bijection g : A’?! + A given by g(f) = f (0), showing 
|A(| = |A|, or æ? = æ. By induction using proposition 14.16(viii), we get 


(No)? = 1,(No)" = No forn eN. 


If we calculate 2” for any cardinal œ, we get an interesting result in terms of 
the power set. Suppose that |A| = œ, then, since |{0, 1}| = 2, we have 


jo, “| = 2%. 
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But a function @ : A — {0, 1} corresponds precisely to a subset of A, namely 
{a € A| d(a) = 1}. 


Define f : {0,1}4 —> P(A) by f(@) = {a € A | ¢ (a) = 1}, then f is a bijection, 
so |P (A)| = 2%. From proposition 14.13 we see that 


2° > a for all cardinal numbers a. 


Order Relations on Cardinals 


We have already proved a number of results concerning the order of car- 
dinals at various points in this chapter. It is now an opportune moment to 
collect these together and make the list more comprehensive by filling in 
the gaps: 


Proposition 14.17: Ifa, 6, y,6 are cardinals (finite or infinite) then 


G)a<B,Bsy>ax<y, 

(ii) a<p,B<as>a=B, 

(iii) a<B,y<d>S>aty < ft, 
(iv) a<B,y < ô = ay < pô, 

(v) a <B, y <= a” <p. 


Proof: Select sets A, B, C, D with cardinals a, B, y, ô. 


(i) Iff : A > B,g : B — Care injections, then gf : A —> C is an 
injection. 
(ii) This is the Schröder-Bernstein theorem. 
(iii) Given injections f : A > B,g: C > DwhereANC = Ø, BND = ©, 
define h : AUC —> BU D by 


f(x) forxeA 


h(x) = ees for x € C. 


Since A N B = Ø, this is well defined, and since B N D = Ø, the fact that f, g 
are injections implies h is an injection. 


(iv) Given injections f : A > B,g : C —> D, define p : A x C —> B x D 
by 


p((a,c)) = (f(a), g(c)) foralla € A,c € C. 
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Clearly p is an injection (for if p((ai1,c1)) = p((a2,c2)), then 
(f(a1),.g(c1)) = (fla), g(c2)), so f(a) = f(ar),g(c1) = f(c2), and 
the injectivity of f, g implies a; = a, c1 = c2). 

(v) This is best visualised by considering A C B,C C D. (If we are given 
injections f : A > B,g : C > D, replace A by f(A) € B, and C by 
g(C) C Din the argument that follows.) 


For A C B,C C D, to define a map u : AC —> BP, all we need to do is 
to show how to extend a function Ø : C —> A toa function 4(¢) : D > B. 
(The function u(ġ) : D —> B isn’t usually an injection; don’t confuse this 
with the function u : AC —> BP.) The easiest way to do this is to select an 
element b € B, (any one will do, the exceptional case B = Ø easily implies (v) 
by a separate argument); then define u(ġ) € B? by 


HOD = TR” fede DAC 


Then u : A] + PP is an injection because (1) = (2) implies 


[u(1)] (d) = [u(ġ2)] [(d)] for all d € D; 


in particular, this means that 


ġı(d) = ġ2(d) for alld € C, 


so Qı = Q2. 


Looking at this last proposition, there is a notable omission from the list 
of properties we might expect of an order relation. We have not asserted 
that any two cardinal numbers are comparable; that is, given cardinals a, B 
then either a < $ or B < a. What this would amount to is selecting sets 
A, B with cardinals a, $ respectively and showing that there is either an in- 
jection f : A— B, or g:B— A, (or both). To be able to construct such an 
injection, we would either have to know something about the sets A and B, 
or we would need some general method of proceeding with the construc- 
tion of a suitable injection. Given specific sets, we can proceed in an ad hoc 
fashion and use our ingenuity to try to set up an injection from one to the 
other. A general method that works for all sets requires us to be much more 
precise about what we mean by a set. It strains the bounds of set theory. 
Until we put specific restrictions on what we mean by the word ‘set’ we can- 
not say how to compare two of them. The theory of sets has grown into a 
large and living plant; to nourish it we must put down stronger roots into the 
foundations. 
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Exercises 


332 


| 


. Let X be the set of points (x,y,z) € R? such that x,y,z € Q. Is X 


countable? 


. Let S be the set of spheres in R? whose centres have rational coordin- 


ates and whose radii are rational. Show that S is countable. 


. Let [0, 1[ be the set of real numbers x such that 0 < x < 1. By writing 


each one as a decimal expansion, prove that [0,1 [ is uncountable. 


. Which of the following sets are countable? (Prove or disprove each 


case.) 

(a) {n e N|n is prime} 
(b) {re Q|r > 0} 

(c) {xe R|1 <x < 107 
(d) C 
(e) {x € R|x? = 243" for some a,b € N}. 


1,000,000} 


. Ifa,b € Randa < b, the closed interval [a, b] is 


[a,b] = {x € R|a < x < b}, 


the open interval is 


]a,b [= {xE Rla < x< b}, 


and the half-open intervals are 


[a,b[ ={x €R|la<x < b} 


]a,b] = {x €R|a <x < b}. 
Prove for a < b,c < d, that f : [a,b] — [c,d] given by 


_ (b=xje (x -a)d 
FO) b-a ` b-a 


is a bijection. Deduce that any two closed intervals have the same 
cardinal number. 

Prove also that [a, b], Ja, b[, [a, bl, ]a, b] all have the same cardinal 
number. (Hint: Show that [a, b] has the same cardinal number as any 
one of the other three by choosing c, d such thata < c < d < b, and 
then using the Schröder-Bernstein theorem.) 


. Prove that the cardinal number of a closed interval, an open interval, 


and a half-open interval is 8. 


. Prove that between any two distinct real numbers there are a count- 


able number of rationals and an uncountable number of irrationals. 
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8. 


9. 


10. 


11. 


12. 


13. 


Construct an explicit bijection from [0, 1] to [0, 1[. (If all else 
fails, try using the Schroder-Bernstein construction on the injections 
f : (0,1) > [0,1[,f (x)= tx,and g : [0, 1[ — [0,1],g(x) = x.) 
If Aı, A2 are arbitrary sets, prove 

|A] + |A2| = [A1 U Aa] + [A1 N AQ]. 


Generalise to n sets A1, . . ., Ay. 


Find counterexamples which demonstrate that the following general 
statements are false for cardinal numbers a, f, y: 
(a)a<Bp>at+y<Bty 

(b) œ < B > ay < py 

(c) a< p => a” < BY 

(d) a < $ > y” < y’. 

(a) Define f : [0,1[ x [0, 1[ — [0, 1[ by 


f0- aiaz...an... 0- bib... bn...) = 0+ abiazba...anbn... 


Deduce that X? = X^. 
(b) Prove the result of (a) more elegantly by using 2% = N, and the 
properties of cardinal arithmetic. 
(c) Using 18 < NoN < NN, or otherwise, find NoN. 
(d) What is nN forn € N? 
(e) Prove 8*° = N and NÙ = 2°. 
(f) Find NÌ. 
Given an infinite cardinal œ, it may be shown that there exists a 
cardinal number £ such that œ = No. Use this to show that Xow = a. 
(The proof by which Cantor showed that there exist transcendental 


numbers without actually specifying any!) 
A real number is algebraic if it is a solution of a polynomial equation 


anx” +++++a\xX+ a9 =0 


with integer coefficients. If not, it is transcendental. 

(a) Show that the set of polynomials with integer coefficients is 
countable. 

(b) Show that the set of algebraic numbers is countable. 

(c) Show that some real numbers must be transcendental. 

(d) How many transcendental numbers are there? 
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CHAPTER 15 


Infinitesimals 


bers R as a complete ordered field. This construction reveals the real 

numbers as the unique structure, up to isomorphism, that satisfies 
the given axioms. Visually, the real numbers fill the geometric real line, so 
it seems impossible to fit any more points between them. For instance, an 
element x € R cannot be arbitrarily small, in the sense that 0 <x <r for 
all positive r € R. If we try to find such an x € R, then r = %,x would be 
smaller—a contradiction. 

Yet, in the historical development of the calculus, the idea arose of quan- 
tities x, y that can vary by ‘arbitrarily small’ quantities dx and dy, and such 
ideas remain today in practical applications. Given a relation such as y = x’, 
when x changes to x + dx, then y changes to y + dy = (x + dx)’, and Leibniz 
calculated 


A s an ongoing theme, we have built a formal model of the real num- 


dy (x+dx)?-x? 


a Jx = 2x + dx. 


He went on to argue that if dx is infinitesimally small, it does not change 
the value of 2x significantly, so the rate of change dy/dx can be taken to be 
2x exactly. Newton used a physical image of ‘flowing’ quantities to justify a 
similar calculation in different notation. 

This proposal led to centuries of dispute over the legitimacy of such ar- 
guments, focusing on the problem that if dx is not zero, then 2x + dx is not 
precisely equal to 2x, but if dx is equal to zero then it cannot be used as the 
denominator of the quotient dy/dx. 

The problem was eventually resolved to the satisfaction of pure math- 
ematicians by introducing the idea of limit and the modern definitions of 
analysis. Replace ‘arbitrarily close’ by reformulating the question of how close 
as a finite challenge. The limiting value is specified as a real number L that 
satisfies: 
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Tell me how accurate you want the result to be by specifying an error 
£ > 0, and then I will specify ô > 0 so that when dx is non-zero and smaller 
in size than ô then the difference between (f(x + dx) - f(x))/dx and L is 
less than the error £ you require. 


This approach led to the modern formulation of analysis, but an accidental 
by-product was the elimination of infinitesimals. First, when Cantor intro- 
duced infinite cardinals he showed that they can be added and multiplied, but 
not subtracted or divided. As a consequence, an infinitesimal cannot exist as 
the reciprocal of an infinite cardinal. 

Second, he was also the first person to construct the real numbers using 
Cauchy sequences of rationals and he proved the completeness axiom for 
the real numbers. Introducing irrational numbers to ‘fill in the gaps’ between 
the rationals did not leave any room on the number line for even smaller 
infinitesimals. 

Richard Dedekind formulated an alternative construction of the real num- 
bers using “Dedekind cuts’. These divide the rational numbers Q into two 
disjoint subsets, one subset to the left L and one to the right R, where every 
element of the left subset is to the left of every element in the right subset. 
There are two kinds of cut. The first occurs when there is a rational num- 
ber a such that all rationals less than a are in L and all those larger than a 
are in R; then a can be placed in either. The other is typified by the case 
where R consists of all positive rationals r satisfying r? > 2 and L is every- 
thing else. This ‘cut’ does not occur at a rational number; it corresponds to a 
new phenomenon on the number line the square root of two. 


cut 
L | R 
ee Q 
-3 -2 -1 0 1 2 3 4 


Fig. 15.1 An irrational cut 


Dedekind cuts are an alternative method to construct a system of real 
numbers containing the rational and irrational numbers. 

The theories of Cantor and Dedekind supported the idea that the real 
number line is complete—not only in the axiomatic mathematical sense that 
all Cauchy sequences of rational numbers converge to a unique real number, 
but in the intuitive sense that the extra irrational numbers fill up the number 
line. This was later formulated as: 


The Cantor-Dedekind Axiom: The real numbers are order isomorphic 
to the linear continuum of geometry. 
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According to this proposed axiom, the geometric real line corresponds 
precisely with the arithmetic decimal number line up to order isomorphism: 
the unique complete ordered field. Completeness of the real line was taken 
to mean that there is no room on it to fit in infinitesimals. 

This view was widely accepted in the early twentieth century, and in- 
finitesimals were usually excluded from mathematical analysis. Yet applied 
mathematicians continued to use ‘arbitrarily small’ quantities as a meaning- 
ful way of thinking about the calculus. Infinitesimals seemed to be useful in 
practice but problematic in theory. 

We have seen that such a phenomenon is common when mathematical 
systems have been successively generalised over centuries. Intuitive assump- 
tions sometimes acquire iconic status, and can’t be questioned even in a new 
context. To avoid this error we ask: if there are no infinitesimals in the real 
numbers, could such quantities exist in a larger system than the reals? We 
already know that we can place the real number line in the broader complex 
plane. Is there any possible way that we could introduce an extension of the 
real number line that incorporates infinitesimals? 

For instance, can we imagine an ordered field K that contains a subfield 
isomorphic to R, but in which there are elements x € K such that 0 <x < r 
for all positive r € R? If we can, the earlier contradiction can no longer be 
obtained by taking r = 7/, x, because 7, x is not in R; only in K. 

We therefore give a formal definition: 


Definition 15.1: If K isa field with R as an ordered subfield, then x € K is 
said to be infinitesimal in K if x # 0 and -r < x < r for all positive x € R. 


Such a possibility does not contradict Cantor’s theory of real numbers, 
nor his theory of infinite cardinals. The infinitesimals in a field are not the 
reciprocals of infinite cardinal numbers nor are they real numbers. They are 
elements in the ordered field K. 


Ordered Fields Larger than the Real Numbers 
Many fields contain the real numbers as a subfield. An easy example is the 
field R(x) of rational expressions with elements 


Ax" +--- +a 
A (where a,, b, € R, bm #0). 
bmx” +--+ + bo 


This forms a field in which elements of the form aọ/1 correspond to the real 
numbers. In this way we can consider R as a subfield of R(x). 
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The field R(x) can be given an order in various ways. For example, on page 
214, example 10.3, we showed how to order R(t) by saying that f(t) > g(t) if 
the graph of f is higher than the graph of g for large real values of t. In this 
sense t is larger than any real number k, so t is infinite in this order and 1/t is 
infinitesimal. 

For convenience we now consider the order using x = 1/t and speak of 
the ordering on R(x) where x is to be infinitesimal. This involves comparing 
graphs for small positive real values of x saying that f(x) < g(x) if the graph 
of f is below the graph of f in a sufficiently small interval to the right of the 
origin. 

For instance, the following picture shows three such graphs, y = x, y = x’, 
and y = 2. 


Fig. 15.2 Graphs of rational functions 


At different values of x a vertical line meets these graphs at various points 
and the comparative order may be different. For instance, marking the point 
where the graph y = 2 meets a vertical line with a circle @, y = x with a triangle 
A and y=.’ with a square, W, we can see that in position A we have the order 
by height as W < A < @, but in position B, we have A < @ < W. As the 
vertical line varies, the constant elements in R (such as y = 2) remain in the 
same place, but the others vary. 

However, if we consider what happens as the line A moves to the left, 
getting closer and closer to the vertical y-axis, the order settles down to 
E < A < @, which suggests the order x? < x < 2. This also happens 
if the constant 2 is replaced by any real number r > 0: for all xin0 <x <r 
we have 0 <x? <x <r. 

This suggests a possible way to order the rational functions so that x is 
positive and satisfies x < r for all positive real numbers r. To give the 
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field R(x) the structure of an ordered field, we define a subset R(x)* that 
satisfies axioms (O1)-(O3) on page 189. 


Definition 15.2: A rational function f(x) is in R(x)* if it is either zero or 
strictly positive on some interval 0 < x < k. 


To prove that this makes R(x) into an ordered field in which x is infinitesi- 
mal, note that a non-zero rational function a(x) = p(x)/q(x) has only a finite 
number of places where the polynomial p(x) = 0, and only a finite num- 
ber of places where q(x) = 0 and the rational function is undefined. Let k 
be the smallest positive value among these points. Then a(x) is non-zero on 
the interval 0 < x < k, and it cannot be positive in one place and negative in 
another because it would then be zero somewhere in between. (Here we are 
assuming the intermediate value theorem, proved in any course on analysis.) 


Proposition 15.3: The field R(x) is an ordered field, with R(x)* as the set 
of elements that are zero or positive. 


Proof: (O1) If a(x), b(x) € R(x)*, then each is a rational function that is 
either zero or strictly positive in some interval to the right of the origin, so 
their sum and product is either zero or strictly positive in the smaller of the 
two intervals concerned. 

(02) If a(x) € R(x) then either a(x) = 0 or a(x) is strictly positive or strictly 
negative in an interval to the right of the origin, so either a(x) € R(x)* or 
-a(x) € R(x)*. 

(03) If a(x) € R(x)* and -a(x) € R(x)* then a(x) cannot be both strictly 
positive and strictly negative, so it must be zero. 


The order on R(x) is defined in a technical manner, but it satisfies the 
required axioms of an ordered field. When we try to imagine the elements of 
this field, there are several possibilities. The first is to consider the field as a 
purely symbolic set of quotients of polynomials in a single unknown x with 
the usual algebraic operations on the elements. Another is to visualise the 
elements as graphs of rational functions. 

A third possibility is to imagine points where the graphs meet a vertical 
line y = vas visa variable real number that becomes smaller. This represents 
the elements of R(v) as points on the vertical line where x is replaced by v. 
Now we can think of the terms symbolically as rational functions in v where 
v is a variable quantity. 
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Fig. 15.3 Elements of R(v) as variable quantities 


The figure shows the vertical line with three points on it corresponding to 
the values v, v?, and r, where r is a fixed real number. In an interval to the 
right of the origin where x < r, the points are in ascending order v’, v, and r, 
representing the order v < v? < r. 

This is an interesting idea as it reveals two kinds of quantity: constant quan- 
tities which correspond to real numbers in R that remain in a fixed position, 
and variable quantities corresponding to non-constant rational functions, 
represented as points that vary as v becomes small. 

In particular, a variable quantity like v, that becomes smaller than any fixed 
real number as the line x = v moves towards the vertical axis x = 0, is an 
infinitesimal in this ordered field. The point marked v satisfies 0 < v < ras 
v gets smaller than r. And 0 < v? < v for v < 1, which shows that v? is an 
even smaller infinitesimal than v. 

Can we take v to be so small that it is infinitesimal? No. Mathematically 
a complete ordered field cannot contain an infinitesimal. Furthermore, the 
way in which Dedekind and Cantor completed the real line by introducing 
the irrational numbers suggests that there is simply no room on the number 
line to fit in infinitesimals. But is there another way of visualising infini- 
tesimals? The answer is ‘Yes!’ We can achieve this, but not by restricting 
ourselves to the real numbers. We simply work in a larger ordered field. 


Super Ordered Fields 


Formal mathematics lets us define new concepts with useful properties, and 
then to use these new concepts as a basis for further proofs. In our search for 
infinitesimals, we define a new concept: 


Definition 15.4: A super ordered field is an ordered field K that contains 
the real numbers R as a proper ordered subfield. 
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You will not find this definition in any other texts at the moment. We have 
taken the opportunity to formulate a new definition to show how mathemat- 
ical theory evolves into the future. This definition proves to be precisely what 
is needed to provide the precise formal structure for an infinitesimal in an 
ordered field. 


The Structure Theorem for Super Ordered Fields 


Structure Theorem 15.5: Let K be super ordered field. Then an element 
k € K where k ¢ R satisfies precisely one of the following: 


(a) k > r for all real numbers r, 
(b) k <r for all real numbers r, 
(c) there isa unique real number c so that k = c+e where e is infinitesimal. 


Proof: Either k satisfies (a) or (b), or there exist a,b € R witha < c < b. 
Consider the set S = {x € R | x < k}. This is non-empty (because a € S) and 
bounded above by b, so it has a least upper bound c where a < c < b. Let 
e = k-c, then k = c + e. The element e cannot be zero because k ¢ R. If e 
is positive, either e is infinitesimal or there isr € R such thatO < r < e. 
Adding c leads toc < c+ r < c+e = k. This gives a real number c + r less 
than k and therefore in S, contradicting c being an upper bound of S. Hence e 
is infinitesimal. 

On the other hand, if e is negative and not infinitesimal, then c < -r < 0 
for some positive r € Randk =c+e<c-r<c, giving areal number c -r 
exceeding k. This is an upper bound, but less than the purported least upper 
bound. So again, e is infinitesimal. 


This theorem gives information about the structure of any super ordered 
field K. Such a field has properties that resonate with historical ideas of finite 
and infinitesimal quantities. We choose to name the elements of K quantities. 
They are either 


constant quantities: elements in R, 
positive infinite quantities: elements k > r for all r € R, 
negative infinite quantities: elements k < r for all r € R, 


or 
finite quantities of the form k = c + e where c € R and e is infinitesimal. 


This resonates strongly with the historical view of constant and vari- 
able quantities. A super ordered field consists precisely of quantities that 
are either constant, infinite (positive or negative), or a constant plus an 
infinitesimal. 
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In particular, a finite quantity k is either a constant real number or k = c+e 
where c is a unique real number and e is an infinitesimal. 


Definition 15.6: For any finite number x in a super ordered field, the 
unique real number c such that x = c + e where e is zero or infinitesimal 
is called the standard part of x and is denoted by 


c = st(x). 


This allows us to specify the unique real number that differs from a fi- 
nite quantity by an infinitesimal. There are no infinitesimals in R, just as 
Cantor asserted. However, they do occur in every super ordered field that 
extends the real numbers. So formal mathematics guarantees the existence of 
infinitesimals. We now have a choice: to restrict the study of calculus only 
to real numbers, which leads to the standard and perfectly viable formula- 
tion of analysis using epsilon—delta definitions, or to use infinitesimals in an 
extended system. 

In the applications of mathematics, infinitesimal quantities are often 
considered as variable points on the real number line. Cauchy took this view- 
point by defining an infinitesimal to be a variable quantity that becomes 
arbitrarily small. In modern notation, this idea can be represented as a null 
sequence, which is simply a sequence that tends to zero. 

Cauchy considered such a sequence to be a variable quantity—an infini- 
tesimal. From this he developed continuous functions and calculus. For 
instance, he operated symbolically with a quantity «œ = (a„) by defining 
f(x +æ) to be the sequence of values f(x + an). He defined a function f to be 
continuous at x if f(x + æ) — f(x) is infinitesimal whenever a is infinitesimal. 
He then developed a theory of calculus using infinitesimals, even imagining 
a number line with infinitesimal quantities upon it. 

However, at his time in history, the notion of completeness of the real 
numbers had yet to be formalised and there was no obvious way to represent 
infinitesimals on a number line. The structure theorem for super ordered 
fields offers a solution. 


Visualising Infinitesimals on a Geometric 
Number Line 


To visualise an infinitesimal in a super ordered field, we use the structure 
theorem to see infinitesimal quantities. In chapter 1, when we attempted to 


draw a physical picture of the real number line, we realised that, on a given 
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scale, two distinct points can be so close together that to the human eye they 
are indistinguishable. A ruler marked with centimetres and millimetres lets 
us distinguish between the mark at 1-4 cm and the next mark at 1-5 cm. If 
we tried to mark ,/2 as accurately as possible, we could mark it at approxi- 
mately 1-41 cm, between 1-4 and 1-5. But the difference between 1-414 cm 
and 1-4142 cm with ordinary implements would be impossible to see. Our 
response was to magnify the line, to distinguish between 1-414 and 1-415, 
with 1-4142 nestling between them. When performing the magnification, we 
redrew the lines without making them thicker. 


14142 
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Fig. 15.4 Magnifying R 


If we wish to see the difference between two extremely close numbers, say 
1 and 1 + 1/10'°, we magnify the difference by a factor of 10!°°. The map 
m : R —> R with m(x) = 10!°(« - 1) gives m(1) = 0, m(1 + 1/10!) = 1. 
Under this map the very close numbers 1 and 1 + 1/101% are mapped to 0 
and 1. 
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Fig. 15.5 Seeing two extremely close points on R 


More generally, we can magnify part of the real line by a huge scale fac- 
tor so that two very close real numbers can be seen as two separate points. 
The same technique can be used in a super ordered field K by introducing 
the map 


x-a 
m : K — K where m(x) = — for anya, e € K, e #0. 
e 
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Now m(a) = 0, m(a + e) = 1. Thus whatever the non-zero value of e, be it 
finite, infinite, or infinitesimal, we can define the map m which maps a and 
a + e onto the distinct points 0, 1. 

Usually we take e > 0, so that a+e > a, because this maintains the direction 
on the line so that for a < b we have m(a) < m(b). 


Definition 15.7: The e-lens pointed at a is the map m : K — K where 


m(x) = oa 


This map makes sense for any non-zero e; in particular, for infinitesimals. 
If we take e to be a specific infinitesimal £ > 0, then an e-lens can be used 
to see infinitesimal detail on an extended number line. For instance, we may 
imagine an extended number line K as a geometric number line, with the 
origin, the natural numbers, the rationals, and reals all in their usual places. 
Infinite quantities œ <0 and £ > 0 are too far off to the left and right to see 
on a normal scale, while the two points a, a + £ for a € Rand £ infinitesimal 
are too close together to be marked separately. In figure 15.6 we have drawn 
a to the right of 1, but it could be anywhere else on the extended number 
line K. 


Fig. 15.6 The line to a normal scale 


Now use m(x) = (x - a)/e to map the whole extended number line K onto 
a second number line K. 


Fig. 15.7 Magnifying the whole extended line 


This map sends a to m(a) = 0 and a + € to the distinct point m(a + £) = 1. 
Meanwhile, the image of a general point x is (x - a)/e, which may be finite or 
infinite. 
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Definition 15.8: The field of view of the map m(x) = (x - a)/e where a, 
a+ee€Kande ž 0is the set {x € K | (x - a)/e is finite}. 


The field of view of m is precisely the set of elements that map onto the 
finite part of K. Points outside the field of view map to infinite elements of K, 
which are too far to the left or right to be seen in a finite picture. 


Definition 15.9: Ifu,v € K are both non-zero, then u is said to be of higher 
order than v if u/v is infinitesimal. It is of lower order than v if u/v is infinite. 
The two are the same order if u/v is finite but not infinitesimal. 


Example 15.10: If € is infinitesimal, then £? is higher order than £, and 
1/e is lower order than any finite element. The element 17e + 1066¢? is the 
same order as 5e + me? + 101% e°. 


In general, when using the map m(x) = (x - a)/e, points that differ from a 
by a quantity of order greater than e are mapped to infinite quantities, points 
differing by a quantity of the same order as e are mapped onto finite points 
and points differing by a quantity of lower order are mapped onto points that 
differ by an infinitesimal. 

Because the human eye cannot see infinitesimal quantities, we can strip 
away infinitesimal differences by taking the standard part of the image of the 
map m. 


Definition 15.11: The optical lens 0: K — R based on the e-lens m is 
given by 


olx) = st(m(x)). 


This maps the field of view onto the real line R. 
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Fig. 15.8 An optical lens 
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Here we are interested in the case of an optical lens pointing ata € R 
where e is an infinitesimal e. The field of view consists of points that differ 
from a by infinitesimals of the same or higher order than e. For any r € R, 
o(x + re) = r, so the optical lens maps onto the whole of R. But lower-order 
infinitesimal detail is lost because if 5 is of lower order than e, then the two 
elements x, x + 6 map to the same element of R. 

One further technical convention makes the visual picture even simpler 
to grasp. When we make geographical maps, we draw a representation of 
a particular geographical region R on a map M. We can think of this as a 
function s : R —> M from the original region R to the physical map M. But 
when we mark the position of a specific place, such as the position of London 
on a map of the United Kingdom, we do not write s(London) on the map, we 
write the original name ‘London’. 

Using this convention, we modify the picture by naming the image points 
in R with the same original names in K, on the understanding that what we 
see in R is simply the standard part of the image of original. Now we are able 
to ‘see’ points that are infinitely close in K by using an optical lens to move 
them apart. 


a ate 
field of view ., | ZA +6 +10!®g 


„a+e+10 e? 


a até 


Fig. 15.9 Seeing infinitesimal detail 


The field of view is magnified to fill the whole real line. The image of a is 
distinct from the image of a + £, yet the latter has the same image as a + € + 
101 ¢? even though the number 10! is immense in human terms. Despite 
its vast size, it is still finite and the quantity 1018? is of smaller order than e. 

In this representation we are again ‘abusing notation’ by denoting the im- 
age of an element x in the field of view by the same name x. However, by 
using this notation while being fully aware that the picture represents not 
only the physical image drawn on paper but the full meaning of the formal 
theory, we are offering a natural view of a formal concept. 

We can go even further. When we to attempt to draw a super ordered 
field K as a line to an appropriate scale that allows us to distinguish between 
real numbers, then all we can draw is (part of) the line L consisting of the 
finite elements. Let L be the subset of finite elements of K and I the subset of 
infinitesimals. 
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Theorem 15.12: The standard part map st : L — R maps L to the whole 
of R and is a ring homomorphism satisfying 


st(x + y) = st(x) + st(y), st(xy) = st(x)st(y) 
st(x/y) = st(x)/st(y)(for st(y) # 0). 


Proof: This is left as an exercise. 


As a homomorphism of the additive group, st : L > R is a group homo- 


morphism with kernel J, and maps onto the whole of R. By the structure 
theorem for group homomorphisms (chapter 13, theorem 13.36) L/I is iso- 
morphic to R as an additive group whose elements are cosets of the form 
x+Iforx € L. 

By defining the relation x ~ y if and only if x - y € I the equivalence 
class x + I contains precisely one real number st(x). The equivalence class 
containing a real number a is called the monad’ or ‘halo’ around a and will 
be denoted by M,. This is the cluster of points around a including a and any 
other element that differs from it by an infinitesimal. 

The order on L satisfies x < y where x = a+ £ andy = b + ô if and only if 
either a < bora = band e < ô. This allows us to see why a super ordered 
field cannot be complete. In a negative sense we already know that a com- 
plete ordered field cannot contain an infinitesimal. However, the notion of a 
monad offers positive proof that completeness fails. 


Theorem 15.13: A super ordered field K is not complete. 


Proof: Every monad M, for a € R is non-empty (because a € M,) and 
bounded above by any b € R where b > a. However it cannot have a least 
upper bound c € R. For if c is a least upper bound of M,, then either c € Ma 
or c € My where b > a. It cannot lie in M, because there will be elements in 
Ma that are bigger than c so it is not an upper bound. It cannot lie in My for 
b > a, for then there would be elements in M, that are upper bounds for Ma 
that are smaller than c. Hence a super ordered field K contains subsets that 
are bounded above but have no upper bound in K. 


In our mind’s eye, we can imagine a super ordered field as a number line 
in which the finite part is the field of real numbers with a halo around each 


l Leibniz used the term ‘monad in his philosophical theory to specify indivisible entities 
that make up the entire universe of thought. While these equivalence classes consist of tiny 
elements too small for the human eye to perceive, they are different from the notion of 
Leibniz as each one consists of an infinite set of elements. 
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number consisting of elements differing from it by an infinitesimal quan- 
tity. The standard part map allows us to collapse the monads into single real 
numbers to visualise the real number line in the usual mathematical repre- 
sentation. Optical microscopes allow us to see infinitesimal detail magnified 
to an appropriate visible level. 

We can now see how the notion of infinitesimal quantities that evolved 
over many centuries can be re-evaluated using the notion of super ordered 
field. In developing the calculus, Leibniz conceived of the idea of an in- 
finitesimal quantity that is arbitrarily small in size. Then Euler produced 
remarkable results by thinking of an infinitesimal as a symbol that he could 
manipulate using algebraic rules. The first example given in this chapter be- 
gins with the field R(x) where elements are manipulated purely symbolically 
and x is an infinitesimal, as in figure 15.10(1). 


expressions including ke R, x, x’, 
with general term: 
n 
A,X" +...+, 


(b, #0) 
b,x" +...4b, °° 
1. elements of IR(x)as algebraic expressions 2. as graphs 

k 

yp 

x=V 

= 

3. as constants and variable quantities 4. as points on an extended number line 


Fig. 15.10 Four isomorphic representations 


We moved on from the algebraic manipulation of symbols in R(x) to visu- 
alise the corresponding rational functions as graphs as in figure 15.10(2). Now 
an infinitesimal is a whole graph and we compare the order of items by how 
the graphs are ordered a little to the right of the origin. 
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Next we considered where the graphs of rational functions meet the ver- 
tical line x=v. As v gets small, constant functions meet the line in a fixed 
point but variable functions meet the vertical line in a variable point where 
the order is determined by what happens as v gets small as in figure 15.10(3). 
This example is consistent with the idea of a number line including con- 
stant quantities that are real numbers and variable quantities that can include 
infinitesimals. 

More generally, the structure theorem for any super ordered field reveals 
how we can imagine an infinitesimal as a point on a number line that can be 
visualised horizontally or vertically as a number line. Figure 15.10(4) shows 
a vertical presentation as the ultimate form of the super ordered field R(e) 
where ¢ is an infinitesimal. 

However, this visualisation of an infinitesimal now works not just in the 
example of R(e), but in any super ordered field K. 


Magnification in Higher Dimensions 
The idea of infinite magnification on the extended number line K can 
be easily used in two or more dimensions by using e-lenses on each axis 


separately. 


Definition 15.14: The ¢-d-lens pointed at (a,b) € K? is the map 
m : K? + K? given by 
_[x-a y-b 
mies) = ( a a } 


The optical e-6-lens pointed at (a, b) € K? is the map o : K? > R? given by 
o(x, y) = (st((x - a)/e, st((y - b)/8). 


The field of view of an optical e-8-lens pointed at (a, b) € K? is the set 


{(x,y) € K? | (x- a)/e, (y - b)/8 are both finitd. 


The elements a, b, £, 6 may be any elements in K provided that £ and ô 
are non-zero. For example, we can choose a or b to be infinite to view the 
situation ‘at infinity’, or we can choose € and ô to be infinitesimal to look at 
‘infinitesimal detail’. 


Example 15.15: Let f(x) = x? and suppose that x € R and £ is infinitesi- 
mal. Then an optical e-5-lens with € = ô, pointed at (x, x’), sees a nearby 
point (x + h, (x + h)*). 
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o(x +h, (x + h)*) = (==) , T “)) 
= (st h i 2xh + h? 
COREE) 
= («(2) , st(2x + mst(*)) i 
g E 


If this is in the field of view, then st(x/e) must be finite, so h is of the same 
order as £ or less, so h is also infinitesimal and st(2x + h) = 2x. Writing 
A = st(x/e), this gives 


o(x + h, (x + h)?) = (A, 2x). 

So under the optical lens, the field of view is mapped precisely to the 
whole real line, represented parametrically by A(1,2x) for any real num- 
ber A. In representing the picture of the map o from the field of view as 
a subset of K? to the real plane R*, we again use the convention that the 
image o(x + h, f(x + h)) is also denoted by (x + h, f(x + h)) and the image 
o(x + h, f(x + h)) is denoted by (x + h, f(x + h)). The optical lens magnifies 
an infinitesimal part of the graph, centred on (x, f(x)), to an infinite straight 
line in R? passing through (x, f(x)). 


fax 


(x+h, f(x +h)) 


(x+h, f(x +h)) 


D optical microscope 
(x, f(x) 7] 


>] SO (1% F(x) 
graph in K? 

real line in R? 
slope f'(x) = 2x 
Fig. 15.11 Magnifying a locally straight graph to see a full straight line 


Calculus with Infinitesimals 


This experience suggests that we may be able to do calculus logically with in- 
finitesimals. However, one more step is needed to make this fully operational. 
When calculating the derivative of a function f(x), we form the ratio 


f(x+h) -f(x) 
h 
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for infinitesimal h and take the standard part. To do this, we must be able to 


calculate f not only for elements in R, but also for elements in the extension 
field K. 
For instance, if f(x) = x’ then, for infinitesimal h, 


flxt+h)-flx) _ (x+h? -x 
h 7 h 


= 3x7 +3xh+h? 


with standard part 3x’. 

This calculation can be performed in the extension field R(e) with the 
infinitesimal h = e because ((f(x + £) - f(x))/e is a rational function in €. 
However, if we are to consider functions other than rational functions, then 
we need more powerful theory. 

The standard functions in calculus such as sin x and cos x can be 
represented as power series: 


xX? 
sinx =x-—+—---- 
3! 5! 
x? x4 
cosx=l]—-—+—-..-. 
2! 4! 


These can be handled in the field R((x)) consisting of power series in e with 
a finite number of negative powers: 


ae * +++ + aye b+ bot biet- tbe” +- 

for an integer k > 0. This extension field serves for functions given by 
rational functions or power series given as combinations of polynomials, 
trigonometric functions, exponentials, logarithms, and so on, as encountered 
in school calculus. 

However, this still does not cope with all possible functions. For instance, 
a sequence 41, A2, An, . . . is a function a : N —> R where a(n) = a,. How do 
we extend this sequence to work in an appropriate extension field? 

In calculus, if we wish to calculate the derivative of a general function 
f : D — R, we form the quotient 


f(x+h) -f(x) 
h 


where x € D and h is infinitesimal. 

This was no problem for Leibniz as his functions were given by a for- 
mula and he assumed that the same formula would work for infinitesimals. 
But modern mathematical analysis works with general functions defined 
set-theoretically that may not have a simple formula. 
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Now we need to extend a set-theoretic function f: D— R to a larger 
domain f:*D—> K where the extended domain *D contains not only real 
numbers but also elements x + h where h is infinitesimal. This is not all 
that is required. For example, a sequence (s,,) is a function s: N—> R where 
Sn = s(n) and we need to consider how to extend such functions in an 
appropriate way. 


Non-standard Analysis 


Abraham Robinson introduced a theory in 1966 [29], called non-standard 
analysis. Whereas standard mathematical analysis only uses the real num- 
bers, non-standard analysis works in a super ordered field called the 
hyperreals, denoted by the symbol *R. 

The techniques for constructing the extension from R to *R are essentially 
the same as those used in chapter 9 to construct the extension from Q to R. 
This began with Cauchy sequences in Q and putting an equivalence relation 
on them so that the equivalence classes became elements of R. To construct 
*R from R, we begin with the set S of all sequences (a,) for a, € R. Such 
a sequence is a function s : N —> R where a, = s(n), so the full set of such 
sequences is S = RN, 

We introduce an equivalence relation on S so that the equivalence classes 
become the elements of *R. The equivalence class containing (an) is written 
as [an] or as [a), a2, . . . , Am . . . ] and we embed R in *R by identifying a € R 
with the element [a, a,...,a,...]. 

The construction requires us to define a relation (a,,) ~ (b,) on S satisfying 
the usual properties of an equivalence relation: 


(E1) (an) ~ (an) for all (a,) € S 
(E2) If (an) ~ (bn) then (bn) ~ (an) 
(E3) If (an) ~ (bn), (bn) ~ (cn) then (an) ~ (cn). 


Then we need to define the usual operations of addition, multiplication, 
and order on the equivalence classes as elements of *R to make it into an 
ordered field extension of R. 

We could begin by suggesting that (a) ~ (bn) if (an), (bn) agree at all but 
a finite number of places. 


Definition 15.16: A subset T C N is said to be cofinite if its complement 
T° = N\T is finite. 
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As a first step, we require that (an) ~ (bn) if an = bn for n in a cofinite 
set. For example, if we took a sequence (an) and changed a finite number of 
terms to get a sequence (D,,), then [a,] = [bn]. In particular, if N is the largest 
element in N such that (a,) # (bn), then (a,) = (bn) for all n > N, meaning 
that the terms of the two sequences are identical from some point on. 

However, we would need to decide what to do with sequences such as 


(an) = (1,0, 1,0,...) and (b,) = (1, 1,1,1,...). 


Do we claim that (an) ~ (bn) or not? In this case they are equal for n € O 
(the odd numbers) and they are different for n € E (the even numbers). To 
make a decision requires us to make a choice. If we choose to focus only on O, 
they are equal, but if we focus only on E, they are different. 

The clever idea that Robinson conceived can be expressed in a simple way. 
For every subset T C N, he decided that he must make the choice between 
what happens on T and what happens on its complement T° = N\T. His 
approach amounts to assuming that it is possible to select a subset U of sub- 
sets of N so that precisely one of T and its complement T° is in U. Then a 
statement such as [a„] = [b,,] would be declared to be true if T € U, and false 
if T° € U. This leads to the following definition: 


Definition 15.17: (a,) ~ (b,) if and only if {n € N |an = bn} € U. 
The choice of U may not be unique, for we may have one choice of U in 
which the odd numbers O € U, in which case 
(1,0,1,0,...])=[1.L11...] 
and another in which the even numbers E € U, in which case 
(1,0,1,0,...]4[1,1,1,1,...]. 


This means that we may have different ways of constructing an appropriate 
extension field and the choice may not be unique. However, what matters 
is that the choice is fit for purpose. So we continue by asking what kind of 
properties are required. 

First, U is a set of subsets of N so U C P(N) and for every T C N we 
require: 


(U1) If T C N then either T € U or N\T € U, but not both. 
We also require: 


(U2) If T is cofinite, then T € U. 
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It is clear that if a statement is true on a set T C N, then it is also true on any 
subset of T, which requires: 


(U3) If T € UandS C TthenS € U. 


We must then check to confirm that we have an equivalence relation satis- 
fying (E1)-(E3). In the proof that follows later, we will find that we require a 
further condition: 


(U4) T),T2 € U > T,; NT2 € Uz 


We will shortly see that these conditions are all that are required, so we 
make the definition: 


Definition 15.18: An ultrafilter on N is a collection U of subsets of N 
satisfying: 


(U1) if T C N then either T € U or T° € U, but not both 
(U2) if T is cofinite, then T € U 

(U3) If T € UandS C T then S € U 

(U4) T),T2 € U > T,; NT, € U. 


We postpone the discussion of how to construct such an ultrafilter until 
the next chapter where we discuss more sophisticated methods appropriate 
for the task. For the rest of this chapter, we assume that we have an ultrafil- 
ter satisfying (U1)-(U4) to consider how the theory works. We begin with a 
lemma: 


Lemma 15.19: If U is an ultrafilter on N, then the equivalence relation 
(an) ~ (bn) if and only if {a € N | an = ba} € U 


given in definition 15.17 is an equivalence relation on the set S of all real 
sequences. 


Proof: To prove (E1), let (an) € S, then 
T={neN|an =a} =N 


so T is cofinite and, by (U3), T € U and (a,) ~ (an) for all (a,) € S. 

(E2) If (a,) ~ (b,), then a, = b, for all n in some set T € U, so b, = a, for 
alln € T and (b„) ~ (an). 

(E3) If (an) ~ (bn) and (ba) ~ (cn) then 


An = bn for all n in some set T; € U 
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and 


by = Cy for all n in some set T, € U. 


SO an = Cy for all n in Tı N T and, by (U4), T1 N T2 € U. 


Having proved that definition 15.17 gives an equivalence relation on S, we 
define *R to be the set of equivalence classes. In particular, if the equivalence 
class containing (a,) is denoted by [a,], then we have 


[an] = [bn] ifand only if {n € N| a, = bn} € U. 


Now we need to define the field operations on *R, check that they are well 
defined, and prove that they satisfy the axioms for a field. 


Proposition 15.20: The set *R with operations on equivalence classes 
given by 


[an] F [bn] = [an te bn], [an] [bn] = [andy] 


is a field containing R as a subfield. 


Proof: First the operations are well defined, because if [a,,] = [a’,] and [b,] = 
[b’,] then the sets T, = {n € N|a, = al} and T, = {n € N|b, = bi} satisfy 
Tı, T2 € U, so that a, + b, = ai, + bi, for n € Ti N T2. By (U4), T) N T2 E U, 
so [a, + by] = [a + Ui]. 

The proof for the product is similar. 

The proofs of commutativity, associativity, and distributivity of addition 
and multiplication are straightforward. (You should explain them to your- 
self.) The zero of *R is [0, 0,..., 0,...], the unit is [1, 1,...,1,...], and 
R can be embedded in *R by identifying a € R with [a, a,...,a,...]. The 
additive inverse of [an] is [-ay]. 

The only difficult part is to define the multiplicative inverse 1/[a,] of [an] 
because the simple solution defining 1/[a,] to be [1/a,] will not work if any 
of the a, are zero. To cope with this, we note that [a„] = [0] if and only if 
the set 


T={neNla,=0} €U. 


If [a„] 4 [0], then T ¢ U and, by (U1), the set T° = {n € N| a, 4 0} € U. 
Let 


b = an if a, #0 
”" |1 ifa, =0. 


Then b, # 0 for all n. Because {n € N | a, 4 0} € U, by definition [a,] = [bn]. 
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Now define 
1/[a,] = [1/b,]. 


This completes the proof that *R is a field and R may be embedded as a 
subfield by identifying a € R with [a,a,...,a,...] € *R. 


It is now only necessary to extend the order from R to make *R an ordered 
field. 


Definition 15.21: *R* = {[a,] € *R|[a,] = [0]} or, equivalently, 


[an] > [bn] if and only if {n € N|a, > ba} € U. 


Theorem 15.22: *R isa super ordered field. 
Proof: We first need to check that the order is well defined and then that it 
satisfies the standard properties of order. 

To check that it is well defined, we must show that if [a,] = [a] and 
[b,,] = [b] then [an] > [bn] is the same as [a',] > [b]. 

If [an] = [a] and [b,] = [bi] then Tı = {n e Nla, = ah 
T = {n € N|b, = b} satisfy T1, T2 € U. So 


an = a,, and b, = bi, forn € Ti NT. 


By (U4), Tı N Tz € U. 
If [a,] > [bn] and T; = {n € N|a, > by}, then T; € U and by (U4) again 


T =(T,;NT,)NT3 EU. 


For n € T we have a, = al, bn = bi, and a, > b, which gives aj, > Di, for 
n € Tand T € U as required. 
Now consider the standard properties of order: 


(O1) [an], [bn] € *R* = [an] + [bn], [an] [bn] € *R* 
(02) [a„] € *R > [a,] € *R* or - [a„] € *R* 
(03) If[a,] € *R and -[a„] € *R* then [a,] = [0]. 


To prove (O1), suppose that [a,], [bn] € *IR*; then, by definition 15.21, 
Tı = {n e N|an > 0} € U, Th = {ne N|b, > 0} EU 
and, by (U4), 
T=TNT,€U 
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so, for n € T we have 
a, +b, > Oanda,b, > 0 where T € U. 
Using definition 15.21 again, this gives 
[an] + [bn] = [an + bn] € *R*, [an][bn] = [anbn] € “R 


as required. 
To prove (O2), suppose [a,] € R, and let 


T={nEN|a, > 0}. 
By (U1), either T € U, in which case, by definition 15.21, [an] € *R*, or 
T° ={neNla, <0}e U 
in which case 
{ne N| -a> 0}e U 
and so 
-[an] = [-ay] € *R*. 


To prove (O3), suppose that [a„]€ *R and -[a„]€ *R*; then by 
definition 15.21, 


Tı = {n€ Nļ|a, > 0} € U, T,={ne N|-a, > 0} € U. 
Again, by (U4), we have 
T=T,NT,€U 
so we have 
a, > Oand-a, > Oforn eT, 
which gives 


a, = 0 forn € T where T € U. 


This completes the proof. 


Once *R has been shown to be a super ordered field, the floodgates open. 
For example, we can define œ = [1,2,3,..., n, ...]; then clearly œ is 
infinite because for any real number k, n > k for all n > k. Further- 
more, 1/w = [1, ¥,, ..., %,...] is an infinitesimal and œ + 1 = [1,2,3,..., 
n,...] satisfies +1 > w where w+1 7 w because the nth terms n+ 1 and n 
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are always different. Now these elements have a full arithmetic and you may 
check to see that 


@-2<w-l<w<wtl<...<...2w-1<20<...<o’..., 
and so on. 

You will be able to show that there are elements of different orders; for 
example, œ? = [1,4,9,...,n’,...] is a higher-order infinite element than a, 
and if ¢ = 1/w then 2? = [1, ¥,2,..., Y„2, . . .] is of lower order than e. 

The hyperreals *R are much more powerful than the original conception 
of Leibniz, who imagined that infinitesimal elements were first order, second 
order, and so on. This is true in the field of rational functions R(£) where € 
is infinitesimal. If we take the order of e to be 1, then £” is of order n. But 
there is no element in R(e) whose square is £. However, in *R, the element 
e= [1, %,..., Ym . . .] has square root 


Ve = [L Yas -o Ym] 


and every non-negative element in *R has a square root. 

Furthermore, any real function f : D —> R can be extended very naturally 
to a function on *R. The method is astonishingly simple. First, let *D be the 
elements of the form [x,] where all the x, are in D, giving 


*D = {[xn] € *R | xn € D}. 
Then extend f to *D by defining 


F([xn]) = (F@n)]- 


How breathtakingly beautiful this is! The extension “D of D is made up from 
equivalence classes whose elements are sequences in D and the extended 
function f :“D — *R is defined in a natural way using these sequences whose 
elements are already in D and so the definition is mind-bogglingly simple! 


Amazing Possibilities in Non-standard Analysis 


Once we have the ideas of the hyperreals, we can do amazing things. 
Consider the extension *N of the natural numbers N. By definition, *N 
includes all equivalence classes of sequences of natural numbers, so w = 
[1,2,3,...,,...] € *N. This shows us that *N contains infinite elements. 
To calculate a limit of a sequence (x,,), we consider the function f:N > R 
given by f(n) = Xm extend it to f:*N — R and consider f(N) = xy for 
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infinite N € *N. If xy is finite then we calculate st(xy) and, if we get the same 
value for all infinite N, then this is the limit of the sequence. For instance, if 


6n? +n 
x, = — 
"Qn? -1 


then 


6N?+N 6+1/N 
2N?-1  2-1/N? 


XN 


and, as 1/N is infinitesimal, 


t(n) 6+0 3 
st(xy) = —— =3. 
2-0 
Other definitions such as continuity or uniform continuity can be expressed 
simply in terms of infinitesimals: 


Definition 15.23: f : D > R is continuous at x € Dif 


Vy € *D: x- y infinitesimal implies f (x) - f (x) is infinitesimal. 


Definition 15.24: f : D — R is uniformly continuous in D if 


Yx € *D,Yy € *D : x - y infinitesimal implies f (x) - f(x) is infinitesimal. 


Essentially, the difference between the two is that continuity involves 
x € D, y € *D and uniform continuity involves x, y € *D. 

These ideas extend to more general functions f : D —> R” where D C R”, 
and all relationships involving such functions remain true when extended. 
For instance, if D is the inside of a unit sphere x? + y? + z? < 1 in R? then D 
generalises to the unit sphere *D in *R? with the same formula. 

Predicates P(x),x2,...,X,) in n variables such as the commutative law or 
associative law for elements in R 


X+y=ytx, x(y+z) = xy + xz, 


extend to the same relationships in *R. 
If we quantify relationships such as 


Vx Ee RVye Rixty=yrtx 
J30 e RVx e€ R:x+0=0 


Vx e Raye R:x+y=0 
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then all these relationships generalise to 


Vxe *"RVye *Rixty=ytx 
J0 e *RVx € *R:x+0=0 


* 


* 


Vx e “Raye :x+y=0. 


But, some properties do not generalise, for instance the completeness axiom. 
If we look at the completeness axiom, it says 


VS C R : S non-empty and bounded above implies S has a least upper 
bound. 


This axiom quantifies a set S. All the other axioms for a complete ordered 
field only quantify elements of a set. It is this observation that makes non- 
standard analysis work. 


Definition 15.25: A quantified predicate is said to be a first-order logical 
statement if it only quantifies elements of sets. 


All the axioms for a complete ordered field except the completeness axiom 
are first-order statements. All the first-order axioms extend from R to *R. 
The completeness axiom does not. 

The axioms for the natural numbers N also exhibit the same phenomenon. 
(N1) and (N2) are first-order statements. However, the induction axiom 
(N3), which says 


VSCN:if(1 eSandneS3n+1€S)thenS=N, 


is not. The extension *N satisfies (N1) and (N2), but not (N3). For example, 
the set S = N is a subset of *N and satisfies 1 € Sand n € S > n+1 € Sbut S 
does not equal *N because w € *N and w ¢ N. 

Non-standard analysis can be shown to satisfy: 


The Transfer Principle: Any true first-order logical statements involving 
elements in R remain true when extended to *R. 


If this principle is taken as an axiom, then it can be used as a basis for devel- 
oping the theory of non-standard analysis. However, it is not our intention to 
take these matters further: this is a book on foundations of mathematics, not 
non-standard analysis. Our main reason for including material on infinitesi- 
mal ideas is to show that, as mathematics evolves, new theories are developed 
that change the way that we think about mathematics. 
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At this stage in history, analysis is studied using standard epsilon-delta 
techniques and there is a very good reason for this. To set up calculus using 
infinitesimals can be pictured using super ordered fields that give us a natural 
sense of ideas that have arisen in various forms in earlier generations. 

To do non-standard analysis properly requires the construction of an ul- 
trafilter U on the natural numbers. This requires deciding for every subset 
T CN whether T or its complement is in U in a manner that fits the def- 
inition of an ultrafilter. It involves making an infinite, even uncountable 
number of choices. As human beings, we certainly can’t do this unaided in 
our finite lifetime. 

The definition of the natural numbers N requires a potential infinity of 
elements but we can at least imagine that theoretically we can reach any 
given element in the sequence, even if this may be utterly impracticable for 
very large numbers. But to contemplate making an uncountable number of 
choices to define an ultrafilter seems to demand more than the human brain 
can take. 

If we look back to the different strands of mathematics that emerged at 
the beginning of the twentieth century in terms of intuitionism, logicism, 
and formalism, we have a number of different options. An intuitionist would 
reject non-standard analysis because the construction of an ultrafilter is be- 
yond our human capacity to accomplish in a finite sequence of steps. Errett 
Bishop took this position in his book on constructive analysis [14]. On the 
other hand, a logicist may be happy to use first-order logic to formulate the 
theory, and that is how Abraham Robinson developed the idea [29]. A for- 
malist mathematician, who may use natural ideas to get initial inspiration, 
subsequently requires theories formulated using set-theoretic definitions and 
mathematical proof. 

In today’s mathematical world, pure mathematics broadly follows the 
formalist approach because the logical foundation required as a basis for 
non-standard analysis has a high initial cost in terms of the logic required. In 
this chapter we have shown how ideas about infinitesimals may be visualised 
in a natural way on a number line using the idea of magnification based on 
algebraic operations. We have also shown how this leads to a way of defining 
the system formally in terms of an ultrafilter. This requires a further stretch 
of imagination that some may be willing to accept as part of a more sophis- 
ticated form of mathematics but others may consider to be unattainable. 

In the final chapter of this book we will contemplate the next step, 
strengthening the foundations of mathematics by axiomatising set theory it- 
self. This allows us to include a further axiom—the axiom of choice—that, 
if taken as an additional axiom for set theory, may be used to prove more 
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powerful results, including the construction of the logical development of 
infinitesimal calculus. 


Exercises 


1. In the field R(x) using the order specified in definition 15.2, which of 
the following are positive: 
(a) x? -2x 
(b) 1/(x? - 2x) 
(c) x- 1000x? 
(d) 1000x? - x 
(e) a + bx + cx? for real values of a, b, c (taking all possible cases into 
account) 
(£) 1/(a + bx + cx?) for various real values of a, b, c. 


2. Place the following elements of R(x) in order: 


x, 0, 2x”, -x?, 1/(1 — x), 25, —x, -x°/(1 - 3x). 


3. How would you test whether a general rational function 


n 
ioc E A (where a,, b, € R, bm Z 0) 
bmx" +--+ + bo 
is 
(a) infinite 
(b) infinitesimal 
(c) finite. 
Write out a full explanation that makes sense to you in a way that you 
can explain to someone else, taking care of every possible case. 


4. Let F be any ordered field, which must contain the rational numbers. 
An element k € F is said to be positive infinite if k > x for all x € Q. 
Make similar definitions to say when an element k is 
(a) negative infinite 
(b) finite 
(c) positive infinitesimal 
(d) negative infinitesimal. 

Prove the following: 

(e) kis positive infinite if and only if 1/k is positive infinitesimal. 
(£) If kis infinitesimal, then k? is infinitesimal. 

(g) if kis infinite and h is finite, then k — h is infinite. 

(h) Ifk € Q and positive and h > k, then h cannot be infinitesimal. 
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5. 


In proposition 15.20, write out in full detail the proof that the com- 
mutative, associative, and distributive properties of the operations of 
addition and multiplication on *R all hold. 


Let K be a super ordered field. Write out a proof that the set I 
of infinitesimals is bounded above but does not have a least upper 


bound. 


Let K be the super ordered field R(e) where e is infinitesimal and 
€ > 0. Show that for any infinitesimal ô € R(e), the element 6/e is fi- 
nite. Let F be the set of finite elements in R(e). Show that the function 
t : F > R x F given by 


t(a +ô) = (a, d/e) 
is an order-preserving bijection in which 
a+ô<b+y &a< bor(a=bandô < y). 
In this bijection, show that the monad 
M,={a+6 € K |ô is infinitesimal} in F 


corresponds to the vertical line through a € R. This representation 
should give you a better sense of why the monads are bounded above 
but do not have upper bounds. Explain this in your own words. 


Use the transfer principle with the statement 
Wx ER x>O0dy>0:y=x 


to deduce that every positive x € *R has a square root in *R and that 
ife > 0 is infinitesimal, then its square root ô = ./é is a higher-order 
infinitesimal. 

Show that the function t(a + 6) = (a, 5/e) maps the finite elements 
a + ô to R x *R in which the image of the monad M; lies in the ver- 
tical line “R through the point a on the horizontal real line and that 
a+6<b+y for real a, b and infinitesimal ô, y if and only ifa < b 
or a = band the elements are in the same monad with ô < y. 


Using the notation [an] to represent the equivalence class of the se- 

quence (a,,) of real numbers, write down an element [a,,] which is: 

(a) the sum of an infinite number and an infinitesimal 

(b) the cube root of [an] 

(c) a number equal to w = [1,2,...,n,...] where (an) # (1,2,..., 
ny...) 

(d) a number of higher order than øw. 
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10. Reflect on the chapter as a whole, reading it through to explain the 
ideas to yourself and to discuss these ideas with others. At this point 
you may not be fluent in operating with the ideas, but it is import- 
ant to gain a sense of how infinitesimal and infinite quantities can 
be imagined visually and manipulated algebraically to lead to more 
sophisticated possibilities in later developments. 
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PART V 
Strengthening the Foundations 


Part IV showed how the material developed so far can lead into the main 
body of mathematics, into ever higher realms. But this final part will lead in 
the opposite direction: down into the depths. 

There is a reason for this. 

Having constructed such a fine building, it becomes prudent to re-examine 
the ground on which it rests. We have replaced a very complicated system of 
intuitions about numbers by a rather simpler system of intuitions about sets. 
But our set-theoretic basis is still intuitive and informal. If we had built a bun- 
galow, this might not have been important; but we have built a skyscraper— 
and one that can be extended to much greater heights. It is time to dig a little 
deeper into the foundations, to see whether they really can support all that 
weight. Or, in a horticultural analogy, we must make sure that the roots of 
our plant will support the fully grown organism, which may require us to 
improve the soil, use higher quality fertiliser, and sow better seed. 

In mathematical terms, as expressed by Klein’s quote at the end of the 
opening chapter, the power of mathematics depends not only on building 
ever more sophisticated branches of the mathematical tree, but also in 
growing deeper roots to support the ever-growing branches that reach up to 
the sky. 

Our aim here is to indicate what can be done, but not actually to do it. So 
we talk in an informal way about the possibility of a formal system of axioms 
for set theory itself. It may seem that the argument has come full circle: here 
we are, right back at the beginning, worrying about the same things as before. 
In fact this is not so: we have come more in a spiral, returning to the same 
point but at a higher level. We now understand the problems involved, and 
their solutions, much better than before. The material we have covered so far 
is quite adequate for almost all of a university course in mathematics. But we 
should not imagine that we have reached a complete and final solution, or 
that total perfection has now been attained. 


CHAPTER 16 


Axioms for Set Theory 


p to this point, we have concentrated on deriving a formal struc- 

ture for arithmetic based on set theory. This analysis has provided 

a deeper understanding of the various number systems, how they 
work, and their place in the scheme of things. It should also have sharp- 
ened your critical faculties and your appreciation for logical rigour. It may 
have sharpened them sufficiently to see that one fundamental ingredient is 
still lacking. We have axiomatised everything we can lay hands on, with one 
notable exception: set theory itself. 

Having taken such pains with the structural detail of the number systems, 
it would be a great pity if the basis on which we worked should turn out to be 
defective—unable to support the weight of the superstructure erected on it. 
In the ultimate analysis, it is hardly more satisfactory to base a formal theory 
of numbers on an informal, intuitive, and naive theory of sets than it is to 
start with an informal, intuitive, and naive theory of numbers themselves. 

However, we may yet escape this criticism by returning to our starting 
point and axiomatising set theory as well. (It would, indeed, have been pleas- 
ant to have started off from an axiomatic basis for set theory, except that 
there are enormous psychological barriers involved in doing something so 
far removed from reality with no idea why it is needed.) We will not go into 
the details very deeply (see Mendelson [27] if you want to do this), nor shall 
we adopt an overly formal style in discussing them. Our aim is merely to 
make clear the unconscious assumptions that have been made about sets, to 
discard some over-optimistic ones that lead to paradoxes, and to list a system 
of axioms that offers a stronger basis for formal mathematical theory. 

Historically, some mathematicians hoped for more than this. At the turn 
of the century a number of them, led by David Hilbert, embarked upon a 
kind of Arthurian Quest for Truth: a firm and immutable basis for math- 
ematics and a guarantee that the truths of mathematics can be rendered 
absolute. In this impermanent and uncertain universe, it is hardly surprising 
that the Holy Grail turned out, in the end, to be a mare’s nest. 
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Some Difficulties 


The problems with naive set theory are of two kinds. First, there are the para- 
doxes: apparently contradictory results obtained by apparently impeccable 
logic. Then there are purely technical difficulties: are infinite cardinals always 
comparable? Is there a cardinal between Xo and 2502 

By way of motivation, we consider two paradoxes. The first, due to 
Bertrand Russell, was alluded to in chapter 3. If 


S ={x|x ¢ x} 


thenisS € Sor S ¢ S? Either answer directly implies the other! 
For the second, let U be the set of all things, defined (say) by 


U = {x|x =x}. 


Now X CU for every set X. In particular, the power set P(U) C U. Taking 
cardinals, 


IU] = |P(U)I, 
but by proposition 12.5 of chapter 12, 
IU] < |P(U)|. 


This is a contradiction: what’s wrong? 
Many responses are possible, among them: 


The Ostrich. Ignore the difficulties and maybe they'll go away. 


The Drop-out. The paradoxes point to unavoidable defects in mathemat- 
ics. Give up, and take up something more profitable such as knitting or 
sociology. 


The Optimist. Re-examine the reasoning, isolate the source of the difficulties, 
and try to salvage what is worth saving while disposing of the paradoxes. 


If you agree with the Ostrich, stop reading here. If with the Drop-out, burn 
this book. If with the Optimist, read on... 


Sets and Classes 
In the next few sections we discuss one possible solution to the problems, 
known as von Neumann-Bernays-Gédel set theory. This starts from the ob- 


servation that a plausible source of trouble is the freedom to form weird and 
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very large sets (for example, the two sets S and U defined above). All of the 
known paradoxes seem to ‘cheat’ in this way. 

We therefore distinguish two things: classes, which may be thought of as 
arbitrary collections (what we hitherto have naively called ‘sets’), and sets, 
which are respectable classes. Then we restrict our ability to define weird 
or large creatures to classes only. This is the idea: the details are roughly as 
follows. 

Classes are introduced as a primitive, undefined term, along with a relation 
€ (corresponding to the intuitive idea of membership) and its negation ¢ . If 
X and Y are classes, then one or other of 


xX €EY,X €Y 
is required to hold. We define equality of classes X = Y by 
(WZ\(Z eX Ze Y). 


We say that a class X is a set if X € Y for some class Y. This is the crucial 
definition: sets are those classes that can be members of other classes. 

This is quite different from the intuitive feeing that sets are things of which 
other things are members. The difference is what makes it hard to define 
weird and large sets. To make this work, we agree that an expression like 


{x| PŒ} 


means ‘the class of all sets x for which P(x) is true’. This restriction is forced 
upon us, because only sets can be members of classes anyway. It has the 
beneficial effect of blocking paradoxes. For example, consider Russell’s class 


S = {X|X €X}. 


In the new interpretation, this is the class of all sets X such thatX ¢ X. Let us 
run through the usual argument for a contradiction, and see what happens. 
Suppose that S € S. Then S is a member of something, so it is a set, so S ¢ S, 
a contradiction. Now suppose S ¢ S. IfS isa set, then it satisfies the defining 
property X € X, so by the definition, S € S. This is a contradiction too. 
There remains, however, the possibility that S is not a set. In this case we 
cannot deduce that S € S; elements of S have to be sets as well as not being 
members of themselves. 

The upshot is that we don’t get a paradox. All we get is a proof that S is not 
a set. Classes that are not sets are called proper classes; we have just proved 
that they exist. Similarly U may be proved to be a proper class, and again 
there is no paradox. 
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The Axioms Themselves 


The majority of the axioms required come as an anticlimax, because all they 
do is state that things we obviously want to be sets are sets. For convenience, 
we assume that the usual notation of set theory also applies to classes, in the 
obvious way. For instance we define 


@={x|x 7x}, 
{x,y} = {x |x = uorx = y}, 


and so on. 

From now on we make the convention that small letters x, y, z, ... stand 
for sets, whereas capitals X, Y, Z, ... stand for classes—which may or may 
not be sets. 


(S1) Extensionality. X=Y <= (VZ)(X €e Z% Y € Z). 


We have defined equality of classes as ‘having the same members’. This 
purely technical axiom says that equal classes belong to the same things. 


(S2) Null set. Ø is a set. 
(S3) Pairs. {x, y} is a set for all sets x, y. 


We now define singletons by {x} = {x, x}, then ordered pairs using the Kur- 
atowski definition (x, y) = {{x}, {x, y}}, then functions, relations, as before. 


(S4) Membership. € is a relation, that is, there exists a class M of ordered 
pairs (x, y) such that (x,y) € M & x E€ y. 

(Ss) Intersection. If X, Y are classes, there is a class X N Y. 

(S6) Complement. If X is a class, its complement X“ exists and is a class. 

(S7) Domain. If X is a class of ordered pairs, there exists a class Z such 
thatu € Z & (u,v) € X for some v. 


Much more interesting is an axiom for defining a class by a property of 
its elements, analogous to {x | P(x)}. We state here a general axiom: it can 
be derived if desired from a small number of more specialised axioms of the 
same type. 


(S8) Class existence. Let $(X1,...;Xn, Yis... Ym) be a compound 
predicate statement in which only set variables are quantified. Then 
there exists a class Z such that 


(xis. -.3Xn) E Z & Q(x.. Xp Yis... Ym). 
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We write 


Z = {(x1,..- Xn) E(x,- <- Xn) (Yis --- Yin) } - 


Notice that the xs here are sets. In particular, the class 
Z = {x| P(x)} 


contains as members only those sets x for which P(x) is true. This, as we saw 
above, allows us to avoid paradoxes. 


(S9) Union. The union of a set of sets is a set. 
(S10) Power set. If x is a set, so is P(x). 
(S11) Subset. If x is a set and X a class, then x N Xis a set. 


There is also an axiom that asserts a slight generalisation of the 
following: 


(S12) Replacement. If f is a function whose domain is a set, then its image 
is a set. 


These axioms suffice for almost all of the constructions we have made us- 
ing set theory. However, they all hold good even if we restrict ourselves only 
to finite sets. We therefore need an axiom to say that infinite sets exist, other- 
wise we cannot construct any of our beloved number systems. We therefore 
add an axiom introduced in chapter 8 (von Neumann’s brainwave): 


(S13) Axiom of infinity. There exists a set x such that Ø € x, and whenever 
y € x it follows that y U {y} € x. 


Using von Neumann’s definition of natural numbers, this axiom boils 
down to the assertion that the natural numbers form a set. It is pretty clear 
that without some such assertion, set theory would not be much use. 

The thirteen axioms listed so far suffice for almost all of our previous work, 
though a detailed proof is (as usual) somewhat involved and tedious. How- 
ever, some of the problems in the chapters on cardinals and infinitesimals 
require more delicate axioms yet. 


The Axiom of Choice 


Proposition 12.5 of chapter 12 used an argument that involved selecting an 
element x, from a set B, then xz from B\{x;}, ..., and in general an element 
Xn+1 from B\{x,, ... , Xn}. Although this looks like a recursion argument, it is 
not covered by the recursion theorem (theorem 8.3 of chapter 8), since X41 
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is found by an arbitrary choice and not in terms of a previously specified 
function. Roughly speaking, the method asks us to make ‘infinitely many 
arbitrary choices’. It turns out (though not easily!) that the list of axioms 
we have so far produced is insufficient to justify this. We therefore state an 
additional axiom: 


(S14) Axiom of choice. If {xa}wveq is an indexed family of sets (with an 
index set a) then there exists a function f such that 


fia (J xa 


aca 


and 
f(a) € Xa for each æ € a. 


In other words, f ‘chooses’ for each œ € a an element of xy. This seems 
quite reasonable. After all, it is essentially saying that if we have a family of 
sets, we can choose an element from each one of them all at the same time. 
But its logical status proves to be difficult to grasp, though it is now well 
understood by mathematical logicians. 

Neither its truth nor its falsity contradicts axioms (S1)-(S13) (in the same 
way that neither the truth nor the falsity of the commutative law contradicts 
the axioms for a group: there exist both commutative and non-commutative 
groups). The first fact was proved by Kurt Gédel in 1940, the second (a long- 
unsolved problem) by Paul Cohen in 1963. For this reason it is customary 
in mathematics to point out whenever the axiom of choice is being used, 
whereas the ordinary axioms (S1)-(S13) are not normally mentioned. 

Assuming the axiom of choice allows us to tidy up two loose ends that 
arose in the chapters on infinite cardinals and infinitesimals. It implies that 
for any sets x, y, either |x| > |ylor|y| > |x|, so that any two infinite car- 
dinals can be compared. (For a proof, see Mendelson [27] p. 198.) It also gives 
a proof that an ultrafilter can be defined on the natural numbers, hence pro- 
viding a proof of the existence of the hyperreal number system. This requires 
considering each subset T C N and placing it into the set U of subsets so that 
the conditions (U1)-(U4) are satisfied. We can start by placing every cofinite 
set in U and every finite set into its complement U*. Then we consider other 
sets that have not yet been assigned and decide whether they should be placed 
in U or not, while still maintaining the conditions (U1)-(U4). Since the sets 
concerned are in the power set P(N), which has cardinal number greater than 
that of N, it turns out that we cannot prove this by a regular induction proof 
but we can prove it using the axiom of choice (see, for example, [9] on the 
internet). 
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As mathematics grows more sophisticated, it turns out that new possibil- 
ities occur that need additional axioms. For example, Cantor formulated the 
Continuum Hypothesis that 


there is no infinite cardinal lying properly between Xp and 2°, 


It happens that neither its truth nor its falsity contradicts (S1)-(S13), or 
even (S1)-(S14). The proofs are again due to Gödel and Cohen. It is perhaps 
surprising that so specific a problem should have such an unspecific answer; 
but it shows how delicate the problems are. 

Other, different, axioms have also been proposed at various times, and 
many of the relations between them are now understood quite well. We refer 
the reader to more specialised texts. 


Consistency 


However, there is one final problem. Having got our set of axioms, how do we 
know that no paradoxes arise? We certainly seem to have avoided them (for 
instance, no one has ever been able to find any), but how can we be certain 
there are no hidden contradictions? A firm, final answer to this question is 
now known. Unfortunately, this is it: we can never be certain. 

To explain this, we must go back to the time of Hilbert. Call a system of 
axioms consistent if it does not lead to logical contradictions. Hilbert wanted 
to prove that the axioms for set theory are consistent. 

For some axiom systems this is easy. If we can find a model for the ax- 
ioms, that is, a structure that satisfies them, they must be consistent—or else 
the model could not exist. The trouble is, what materials do we allow for the 
construction of the model? It is generally agreed that a finite model is unex- 
ceptionable, because any assertion about it can be checked, in principle, in a 
finite time. But the axiom of infinity, for example, means that we cannot find 
a finite model for set theory. 

Hilbert’s idea was that something less restricted should suffice: what he 
called a decision procedure. This is, so to speak, a program consisting of a 
finite sequence of decisions which, when fed a formula in set theory, can 
decide whether it is true (like the truth-table method for propositions). If we 
can find such a program, and prove that it always works, then we can feed it 
the equation 


040 


and see what it says. If it says ‘true’ then our axioms must be inconsist- 
ent, since any contradiction implies the above proposition (use a vacuous 
argument by contradiction: anything is true in an inconsistent system!). 
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For a while it looked as if Hilbert’s idea might work. 

Then Gédel dashed all hopes by proving two theorems. The first is that 
there exist, in set theory, theorems that are true, but for which there neither 
exists a proof nor a disproof.! The second: that if set theory is consistent, 
then there does not exist any decision procedure that will prove it to be. 

The proofs of Gédel’s theorems are quite technical: they are sketched 
in Stewart [32] pp. 294-5. But they demolish Hilbert’s hope of a complete 
consistency proof. 

Does this mean that it is, after all, pointless to seek greater logical rigour in 
mathematics? After all, if at the end the whole thing hovers in limbo, it hardly 
seems worth bothering in the first place. This is emphatically not the moral to 
be drawn. Without a proper search for rigour, we would never have reached 
Gédel’s theorems. What they do is pin down certain problems inherent in 
the axiomatic approach itself. 

They do not demonstrate it to be futile: on the contrary, it provides an ad- 
equate framework for the whole of modern mathematics, and an inspiration 
for the development of new ideas. But with Gédel’s theorems we can avoid 
deluding ourselves that everything is perfect, and understand the limitations 
of the axiomatic method as well as its strengths. 


Exercises 


1. Show that the axiom of choice implies that if f : A — Bis asurjection, 
then |A| > |B|. Conversely, in the context of the other axioms of set 
theory, prove that the latter fact implies the axiom of choice. 


2. Given a collection of sets {Xe }wca indexed by a set A, the cartesian 


product is defined to be the set of all functions f : A —> lU Xa such 
«cA 


that f(a) € Xa. Show that for A = {1, 2, . . . , n} this corresponds to the 
usual definition of X} x X3 X -+ X Xn. 

Prove that the axiom of choice is equivalent to the assertion that 
every cartesian product of non-empty sets is itself non-empty. 

3. Show that there is a choice involved in the proof of proposition 12.1 of 
chapter 12. Express it in terms of a function from a set of subsets of B 
to B. Is it necessary in this case to include all the subsets of B in the 
choice? 


4. Reconsider Goldbach’s conjecture (exercise 13 at the end of chapter 8), 
which postulates that every positive even integer is the sum of two 


1 People always put it this way, but, interestingly, the negation of such a statement is also 
‘true’. Both the statement and its negation are consistent with the other axioms. 
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primes. Look at as many cases of this as you wish to see if there is any 
pattern to the primes which occur. Convince yourself that Goldbach’s 
conjecture might be true but there may be no single proof which will 
work for every case. On the other hand, there is always the possibility 
that the conjecture is false for some very large integer which we have 
not yet found. 


. Given a predicate P(n) valid for alln € N, such that a proof for each 
P(n) exists in a finite number of lines as explained in chapter 6, is it 
reasonable to expect that there is a proof of 


Vn e N: P(n) 


in this sense? 


. Read chapter 1 again and the introductions to each of the five parts 
into which the book is divided. Now review the exercises at the end of 
chapter 1. If you still have the solutions that you wrote out at the time 
you first read chapter 1, so much the better. If the book has achieved 
its purpose, your view on many of these topics will have matured and 
changed. You should now be in a position to appreciate the kind of 
thinking used in more advanced mathematics, together with an idea 
of the sort of problems in the foundations of the subject which are 
worthy of further study. 
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APPENDIX 


How to Read Proofs: 
The ‘Self-Explanation’ 
Strategy 


Prepared by Lara Alcock, Mark Hodds, Matthew Inglis, 
Mathematics Education Centre, Loughborough University 


The ‘self-explanation’ strategy has been found to enhance problem solving 


and comprehension in learners across a wide variety of academic subjects. 


It can help you to better understand mathematical proofs: in one recent 
research study students who had worked through these materials before 
reading a proof scored 30% higher than a control group on a subsequent 
proof comprehension test (see [3]). 


How to Self-Explain 


To improve your understanding of a proof, there is a series of techniques you 
should apply. After reading each line: 


Try to identify and elaborate the main ideas in the proof. 

Attempt to explain each line in terms of previous ideas. These may be ideas 
from the information in the proof, ideas from previous theorems/proofs, 
or ideas from your own prior knowledge of the topic area. 

Consider any questions that arise if new information contradicts your 
current understanding. 


Before proceeding to the next line of the proof you should ask yourself the 


following: 


Do I understand the ideas used in that line? 

Do I understand why those ideas have been used? 

How do those ideas link to other ideas in the proof, other theorems, or 
prior knowledge that I may have? 

Does the self-explanation I have generated help to answer the questions 
that I am asking? 
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On the next page you will find an example showing possible self- 
explanations generated by students when trying to understand a proof (the 
labels “(L1)’ etc. in the proof indicate line numbers). Please read the ex- 
ample carefully in order to understand how to use this strategy in your own 
learning. 
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Example Self-Explanations 


Theorem: No odd integer can be expressed as the sum of three even 
integers. 


Proof: 


(Lı) Assume, to the contrary, that there is an odd integer x, such that 
x =a+b+c, where a, b, and c are even integers. 

(L2) Then a = 2k, b = 2l, and c = 2p, for some integers k, l, and p. 

(L3) Thus x =a+b+c=2k+21+ 2p = 2(k+1+ p). 

(L4) It follows that x is even; a contradiction. 

(L5) Thus no odd integer can be expressed as the sum of three even 
integers. 


After reading this proof, one reader made the following self-explanations: 


e ‘This proof uses the technique of proof by contradiction.’ 

e ‘Since a, b, and c are even integers, we have to use the definition of an even 
integer, which is used in L2. 

e ‘The proof then replaces a, b, and c with their respective definitions in the 
formula for x.’ 

e ‘The formula for x is then simplified and is shown to satisfy the definition 
of an even integer also; a contradiction.’ 


e ‘Therefore, no odd integer can be expressed as the sum of three even 
integers.” 
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Self-Explanation Compared with Other Comments 


You must also be aware that the self-explanation strategy is not the same as 
monitoring or paraphrasing. These two methods will not help your learning 
to the same extent as self-explanation. 


Paraphrasing 
‘a, b, and c have to be positive or negative, even whole numbers.’ 


There is no self-explanation in this statement. No additional information 
is added or linked. The reader merely uses different words to describe 
what is already represented in the text by the words “even integers’. You 
should avoid using such paraphrasing during your own proof comprehen- 
sion. Paraphrasing will not improve your understanding of the text as much 
as self-explanation will. 


Monitoring 
‘OK, I understand that 2(k + l + p) is an even integer.’ 


This statement simply shows the reader’s thought process. It is not the same 
as self-explanation, because the student does not relate the sentence to add- 
itional information in the text or to prior knowledge. Please concentrate on 
self-explanation rather than monitoring. 

A possible self-explanation of the same sentence would be: 


‘OK, 2(k + 1 + p) is an even integer because the sum of 3 integers is an 
integer and 2 times an integer is an even integer.’ 


In this example the reader identifies and elaborates the main ideas in the text. 
They use information that has already been presented to understand the logic 
of the proof. 

This is the approach you should take after reading every line of a proof in 
order to improve your understanding of the material. 
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Practice Proof | 


Now read this short theorem and proof and self-explain each line, either in 
your head or by making notes on a piece of paper, using the advice from the 
preceding pages. 


Theorem: There is no smallest positive real number. 


Proof: Assume, to the contrary, that there exists a smallest positive real 
number. 

Therefore, by assumption, there exists a real number r such that for every 
positive number s,0 < r < s. 

Consider m = r/2. 

Clearly, 0 < m < r. 

This is a contradiction because m is a positive real number that is smaller 
than r. 

Thus there is no smallest positive real number. 
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Practice Proof 2 


Here’s another more complicated proof for practice. This time, a definition 
is provided too. Remember: use the self-explanation training after every line 
you read, either in your head or by writing on paper. 


Definition: An abundant number is a positive integer n whose divisors add 
up to more than 2n. For example, 12 is abundant because 1+2+3+4+ 
6+12 > 24. 


Theorem: The product of two distinct primes is not abundant. 


Proof: Letn = pip2, where p; and p are distinct primes. Assume that 2 < pı 
and 3 < p2. 

The divisors of n are 1, pi, p2, and pıp2. 

Note that a is a decreasing function of pı. 


pitl \ _ 241 _ 
So max (2*1) ZER 3. 


Hence = < p2- 

So pı +1 < Pip2 - p2- 

So pı +1 +p2 < pıp2- 

So 1 + pı + p2 + pips < 2pipr. 


Remember... 


Using the self-explanation strategy has been shown to substantially improve 
students comprehension of mathematical proofs. Try to use it every time 
you read a proof in lectures, in your notes or in a book. 
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